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Sir: 

Further to the Notice of Appeal filed December 8, 2003, and received by the USPTO on 
December 10, 2003, herewith are three copies of Appellants' Brief on Appeal. Authorized fees 
include the $ 330.00 fee for the filing of this Brief. 



This is an appeal from the decision of the Examiner finally rejecting Claims 25-33, 39, 
41, 43, 44, and 45 of the above-identified application. 



(1) REAL PARTY IN INTEREST 

The above-identified application is assigned of record to Incyte Pharmaceuticals, Inc., 
(now Incyte Corporation, formerly known as Incyte Genomics, Inc.) (Reel 8841, Frame 0213) 
which is the real party in interest herein. 

02/17/2004 TLUU11 00000007 090108 09745506 
01 FC:1402 330.00 DA 



Doc No.l 18717 



1 



09/745,506 



Docket No.: PF-0300-3 CON 



(2) RELATED APPEALS AND INTERFERENCES 

Appellants, their legal representative and the assignee are not aware of any related 
appeals or interferences which will directly affect or be directly affected by or have a bearing on 
the Board's decision in the instant appeal. 

(3) STATUS OF THE CLAIMS 

Claims rejected: Claims 25-33, 39, 41, 43, 44, and 45 

Claims allowed: (none) 

Claims canceled: Claims 1-22 and 42 

Claims withdrawn: Claims 23-24, 34-38, and 40 

Claims on Appeal: Claims 25-33, 39, 41, 43, 44, and 45 (A copy of the claims 
on appeal, as amended, can be found in the attached Appendix.) 



(4) STATUS OF AMENDMENTS AFTER FINAL 

The Amendment after Final Rejection under 37 C.F.R. § 1.1 16 filed December 8, 2003 
has been entered for purposes of this appeal. In a telephone voicemail message to Appellants' 
representative on January 28, 2004, the Examiner stated that the amendment would be entered 
upon filing of an appeal. 



(5) SUMMARY OF THE INVENTION 

Appellants' invention is directed, inter alia, to an isolated polynucleotide encoding a 
regulatory protein (NHRP), in particular to the elected polynucleotide encoding NHRP-37 (SEQ 
ID NO:74). The claimed polynucleotide has a variety of utilities, in particular in expression 
profiling, and in particular for diagnosis of conditions or diseases characterized by expression of 
NHRP, for toxicology testing, and for drug discovery. (See the Specification at, e.g., page 55, 
line 4 through page 60, line 27.) As described in the Specification (page 32, lines 18-30): 

NHRP-37 (SEQ ID NO:37) was first identified in Incyte Clone 2507014 
from the CONUTUT01 cDNA library using a computer search for amino acid 
sequence alignments. A consensus sequence, SEQ ID NO:74, was derived from 
the extended and overlapping nucleic acid sequences: Incyte Clones 2507014 
(CONUTUT01), 1394758 (THYRNOT03), 1650580 (PROSTUT09), 2152990 
(BRAINOT09), 2361374 (LUNGFET03) and 2602153 (UTRSNOT10). 

In one embodiment, the invention encompasses a polypeptide comprising 
the amino acid sequence of SEQ ID NO:37. NHRP-37 is 350 amino acids in 
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length and has two potential glycosylation sites at N147 and N185, and several 
potential phosphorylation sites at S9, S17, T80, T122, S171, T174, T187, T237, 
S293, S313, T315, S329, S340, and T342. NHRP-37 has sequence homology 
with S. cerevisiae , GI 1322869, and is associated with cDNA libraries which are 
immortalized or cancerous and show inflammatory or immune responses 

(6) ISSUES 

1. Whether Claims 25-33, 39, 41, 43, 44, and 45 directed to polynucleotides meet the 
utility requirement of 35 U.S.C. § 101. 

2. Whether one of ordinary skill in the art would know how to use the polynucleotides of 
Claims 25-33, 39, 41, 43, 44, and 45, e.g., in toxicology testing, drug development, and the 
diagnosis of disease, so as to satisfy the enablement requirement of 35 U.S.C. § 112, first 
paragraph, with respect to the utility rejection. 

3. Whether one of ordinary skill in the art would know how to make and use a 
polynucleotide of Claims 25, 28-30, 32, 33, 39, 41, and 43-45, e.g., in toxicology testing, drug 
development, and the diagnosis of disease, so as to satisfy the enablement requirement of 35 
U.S.C. §112, first paragraph, with respect to polynucleotides encoding variants of SEQ ID 
NO:37, polynucleotides encoding fragments of SEQ ID NO:37, polynucleotide variants of SEQ 
ID NO:74, fragments of SEQ ID NO:74, fragments of polynucleotide variants of SEQ NO:74, 
complementary polynucleotide sequences and RNA equivalents to the above. 

4. Whether the polynucleotides of Claims 25, 28, 29, 30, 32, 33, 39, 41, 44, and 45 meet 
the written description requirement of 35 U.S.C. §112, first paragraph, with respect to allegedly 
new matter. 

5. Whether the claimed polynucleotide comprising a naturally-occurring polynucleotide 
sequence at least 95% identical to the polynucleotide sequence of SEQ ID NO:74 or the claimed 
polynucleotide encoding a polypeptide comprising a naturally-occurring amino acid sequence 
least 95% identical to the amino acid sequence of SEQ ID NO:37 of Claims 25, 28, 29, 30, 32, 
33, 39, 41, 44, and 45 meet the written description requirement of 35 U.S.C. §112, first 
paragraph. 
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6. Whether the polynucleotides of Claims 25, 28, 29, 30, 32, 33, 39, 41, and 42 are 
unpatentable under the judicially created doctrine of obviousness-type double patenting over 
Claims 1, 5, 6, and 7 of co-pending Application Serial No. 09/539,800. 



(7) GROUPING OF THE CLAIMS 

As to Issue 1 

This issue pertains to Claims 25-33, 39, 41, 43, 44, and 45. 
As to Issue 2 

This issue pertains to Claims 25-33, 39, 41, 43, 44, and 45. 
As to Issue 3 

This issue pertains to Claims 25, 28-30, 32, 33, 39, 41, and 43-45. 
As to Issue 4 

This issue pertains to Claims 25, 28, 29, 30, 32, 33, 39, 41, 44, and 45. 
As to Issue 5 

This issue pertains to Claims 25, 28, 29, 30, 32, 33, 39, 41, and 44-45. 
As to Issue 6 

This issue pertains to Claims 25, 28, 29, 30, 32, 33, 39, 41, and 42. 



(8) APPELLANTS' ARGUMENTS 

Issue 1: Utility Rejection of Claims 25-33, 39, 41, 43, 44, and 45 

Claims 25-33, 39, 41, 43, 44, and 45 stand rejected under 35 U.S.C. §§ 101 and 112, first 
paragraph, based on the allegation that the claimed invention lacks patentable utility. The 
rejection alleges in particular that "the claimed invention is not supported by either a specific, 
substantial, credible asserted utility or a well-established utility." (Final Office Action, page 2.) 

The rejection of Claims 25-33, 39, 41, 43, 44, and 45 is improper, as the inventions of 
those claims have a patentable utility as set forth in the instant specification, and/or a 
utility well known to one of ordinary skill in the art. 

The invention at issue is a polynucleotide corresponding to a gene that is expressed in 
human tissue. The claimed invention has numerous practical, beneficial uses in toxicology 
testing, drug development, and the diagnosis of disease, none of which requires knowledge of 
how the polypeptide coded for by the polynucleotide actually functions. As a result of the 
benefits of these uses, the claimed invention already enjoys significant commercial success. 
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Appellants previously submitted (in unexecuted form on January 27, 2003 and in 
executed form on February 20, 2003) the FirstDeclaration of Tod Bedilion describing some of 
the practical uses of the claimed invention in gene and protein expression monitoring 
applications. The First Bedilion Declaration demonstrates that the positions and arguments made 
by the Patent Examiner with respect to the utility of the claimed polynucleotide are without 
merit. 

The First Bedilion Declaration describes, in particular, how the claimed expressed 

polynucleotide can be used in gene expression monitoring applications that were well-known at 

the time the patent application was filed, and how those applications are useful in developing 

drugs and monitoring their activity. Dr. Bedilion states that the claimed invention is a useful tool 

when employed as a highly specific probe in a cDNA microarray: 

Persons skilled in the art would appreciate that cDNA microarrays that contained 
the SEQ ID NO:74 polynucleotide would be a more useful tool than cDNA 
microarrays that did not contain the SEQ ED NO:74 polynucleotide in connection 
with conducting gene expression monitoring studies on proposed (or actual) drugs 
for treating immune responses and cancers for such purposes as evaluating their 
efficacy and toxicity. (First Bedilion Declaration, SI 15.) 

Appellants further submit three additional expert Declarations under 37 C.F.R. § 1.132, 
with respective attachments, and ten (10) scientific references filed before or shortly after the 
June 6, 1997 priority date of the instant application. 

The First Bedilion Declaration, Rockett Declaration, Iyer Declaration, Second Bedilion 

Declaration, and the references fully establish that, prior to the June 6, 1997 filing date of the 

parent Lai '870 application, it was well-established in the art that: 

polynucleotides derived from nucleic acids expressed in one or more 
tissues and/or cell types can be used as hybridization probes - that is, as tools - 
to survey for and to measure the presence, the absence, and the amount of 
expression of their cognate gene; 

with sufficient length, at sufficient hybridization stringency, and with 
sufficient wash stringency — conditions that can be routinely established - 
expressed polynucleotides, used as probes, generate a signal that is specific to the 
cognate gene, that is, produce a gene-specific expression signal; 

expression analysis is useful, inter alia, in drug discovery and lead 
optimization efforts, in toxicology, particularly toxicology studies conducted early 
in drug development efforts, and in phenotypic characterization and 
categorization of cell types, including neoplastic cell types; 
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each additional gene-specific probe used as a tool in expression analysis 
provides an additional gene-specific signal that could not otherwise have been 
detected, giving a more comprehensive, robust, higher resolution, statistically 
more significant, and thus more useful expression pattern in such analyses than 
would otherwise have been possible; 

biologists, such as toxicologists, recognize the increased utility of more 
comprehensive, robust, higher resolution, statistically more significant results, and 
thus want each newly identified expressed gene to be included in such an 
analysis; 

nucleic acid microarrays increase the parallelism of expression 
measurements, providing expression data analogous to that provided by older, 
lower throughput techniques, but at substantially increased throughput; 

accordingly, when expression profiling is performed using microarrays, 
each additional gene-specific probe that is included as a signaling component on 
this analytical device increases the detection range, and thus versatility, of this 
research tool; 

biologists, such as toxicologists, recognize the increased utility of such 
improved tools, and thus want a gene-specific probe to each newly identified 
expressed gene to be included in such an analytical device; 

the industrial suppliers of microarrays recognize the increased utility of 
such improved tools to their customers, and thus strive to improve salability of 
their microarrays by adding each newly identified expressed gene to the 
microarrays they sell; 

it is not necessary that the biological function of a gene be known for 
measurement of its expression to be useful in drug discovery and lead 
optimization analyses, toxicology, or molecular phenotyping experiments; 

failure of a probe to detect changes in expression of its cognate gene does 
not diminish the usefulness of the probe as a research tool; and 

failure of a probe completely to detect its cognate transcript in any single 
expression analysis experiment does not deprive the probe of usefulness to the 
community of users who would use it as a research tool. 

Appellants file herewith: 

1. the Declaration of John C. Rockett, Ph.D., under 37 C.F.R. § 1.132, with Exhibits 
A-Q (hereinafter the "Rockett Declaration"); 

2. the Second Declaration of Tod Bedilion, Ph.D., under 37 C.F.R. § 1.132 
(hereinafter the "Second Bedilion Declaration"); 
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3. the Declaration of Vishwanath R. Iyer, Ph.D., under 37 C.F.R. § 1.132 with 
Exhibits A-E (hereinafter the "Iyer Declaration"); and 

4. ten (10) references published before or shortly after the June 6, 1997 filing date of 
the priority Lai '870 application,: 

a) PCT application WO 95/21944, SmithKline Beecham Corporation, 
Differentially expressed genes in healthy and diseased subjects (August 17, 1995) (Reference 
No. 1) 

b) PCT application WO 95/20681, Incyte Pharmaceuticals, Inc., Comparative 
gene transcript analysis (August 3, 1995) (Reference No. 2) 

c) M. Schena et al., Quantitative monitoring of gene expression patterns with 
a complementary DNA microarray, Science 270:467-470 (October 20, 1995) (Reference No. 3) 

d) PCT application WO 95/35505, Stanford University, Method and 
apparatus for fabricating microarrays of biological samples (December 28, 1995) (Reference No. 
4) 

e) U.S. Pat. No. 5,569,588, M. Ashby et al., Methods for drug screening 
(October 29, 1996) (Reference No. 5) 

f) R. A. Heller al., Discovery and analysis of inflammatory disease-related 
genes using cDNA microarrays, Proc. Natl. Acad. Sci. USA 94:2150 - 2155 (March 1997) 
(Reference No. 6) 

g) PCT application WO 97/13877, Lynx Therapeutics, Inc., Measurement of 
gene expression profiles in toxicity determinations (April 17, 1997) (Reference No. 7) 

h) Acacia Biosciences Press Release (August 11, 1997) (Reference No. 8) 

i) V. Glaser, Strategies for Target Validation Streamline Evaluation of 
Leads, Genetic Engineering News (September 15, 1997) (Reference No. 9) 

j) J. L. DeRisi et al., Exploring the metabolic and genetic control of gene 
expression on a genomic scale, Science 278:680 - 686 (October 24, 1997) (Reference No. 10) 

The law has never required knowledge of biological function to prove utility. It is the 
claimed invention's uses, not its functions, that are the subject of a proper analysis under the 
utility requirement. 

In any event, as demonstrated by the First Bedilion Declaration, the Rockett Declaration, 

the Iyer Declaration, and the Second Bedilion Declaration, the person of ordinary skill in the art 

can achieve beneficial results from the claimed polynucleotide in the absence of any knowledge 
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as to the precise function of the protein encoded by it. The uses of the claimed polynucleotide in 
gene expression monitoring applications are in fact independent of its precise biological 
function. 

The Final Office Action is replete with arguments made and positions taken for the first 
time in a misplaced attempt to justify the rejections of the claims under 35 U.S.C. §§ 101 and 
1 12. This is particularly so with respect to the substantial, specific and credible utilities 
disclosed in the Lai '870 application relating to the use of the SEQ ED NO:74 polynucleotide for 
gene expression monitoring applications. Such gene expression monitoring applications are 
highly useful in drug development and in toxicity testing. 

The Final Office Action's new positions and arguments include that gene expression 
monitoring results obtained using the claimed polynucleotide as a control would allegedly be 
"uninformative," allegedly "would not add any information to a gene expression or toxicology 
assay regarding the response of the cell to the drug or toxic chemical," or are otherwise 
insufficient to constitute substantial, specific and credible utilities for the claimed polynucleotide 
(Final Office Action, e.g., page 15). In addition, the Final Office Action asserts that the 
Declaration of Dr. Tod Bedilion is insufficient to overcome the rejections, because it allegedly 
"does not present any concrete evidence for a specific and substantial utility for the claimed 
polynucleotides or the polypeptides encoded therefrom, and does not add any evidence to the 
instant disclosure regarding the specific and substantial utility of the claimed polynucleotide." 
(Final Office Action, page 11.) 

Under the circumstances, Appellants are submitting with this Appeal Brief the 
Declaration of John C. Rockett, Ph.D., under 37 C.F.R. § 1.132, with attached Exhibits A - Q; 
the Declaration of Vishwanath R. Iyer, Ph.D., under 37 C.F.R. § 1.132 with attached Exhibits A- 
E; the Second Declaration of Tod Bedilion, Ph.D., under 37 C.F.R. § 1.132; and ten references 
published before or shortly after the June 6, 1997 priority date of the instant application. As we 
will show, the Rockett Declaration, the Iyer Declaration, the Second Bedilion Declaration, and 
the accompanying references show the many substantial reasons why the Examiner's new 
positions and arguments with respect to the use of the claimed SEQ ID NO:74 polynucleotide in 
gene expression monitoring applications are without merit. 

The fact that the Rockett, Iyer, and Second Bedilion Declarations, along with the 

accompanying references, are being submitted in response to positions taken and arguments 

made for the first time in the Final Office Action, including arguments disregarding the 
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persuasiveness of the first Bedilion Declaration, constitutes by itself "good and sufficient 
reasons" under 37 C.F.R. § 1.195 why these Declarations and references were not earlier 
submitted and should be admitted at this time. Appellants also note that the submitted 
Declarations and references are responsive to the new utility rejection as framed by the Board of 
Appeals in copending cases with similar issues. 

I. The applicable legal standard 

To meet the utility requirement of sections 101 and 112 of the Patent Act, the patent 

applicant need only show that the claimed invention is "practically useful," Anderson v. Natta, 

480 F.2d 1392, 1397, 178 USPQ 458 (CCPA 1973) and confers a "specific benefit" on the 

public. Brenner v. Manson, 383 U.S. 519, 534-35, 148 USPQ 689 (1966). As discussed in a 

recent Court of Appeals for the Federal Circuit case, this threshold is not high: 

An invention is "useful" under section 101 if it is capable of providing 
some identifiable benefit. See Brenner v. Manson, 383 U.S. 519, 534 [148 USPQ 
689] (1966); Brooktree Corp. v. Advanced Micro Devices, Inc., 977 F.2d 1555, 
1571 [24 USPQ2d 1401] (Fed. Cir. 1992) ("to violate Section 101 the claimed 
device must be totally incapable of achieving a useful result"); Fuller v. Berger, 
120 F. 274, 275 (7th Cir. 1903) (test for utility is whether invention "is incapable 
of serving any beneficial end"). Juicy Whip Inc. v. Orange Bang Inc., 51 
USPQ2d 1700 (Fed. Cir. 1999). 

While an asserted utility must be described with specificity, the patent applicant need not 
demonstrate utility to a certainty. In Stiftung v. Renishaw PLC, 945 F.2d 1 173, 1 180, 20 
USPQ2d 1094 (Fed. Cir. 1991), the United States Court of Appeals for the Federal Circuit 
explained: 

An invention need not be the best or only way to accomplish a certain 
result, and it need only be useful to some extent and in certain applications: 
"[T]he fact that an invention has only limited utility and is only operable in 
certain applications is not grounds for finding lack of utility." Envirotech Corp. v. 
Al George, Inc., 730 F.2d 753, 762, 221 USPQ 473, 480 (Fed. Cir. 1984). 

The specificity requirement is not, therefore, an onerous one. If the asserted utility is 
described so that a person of ordinary skill in the art would understand how to use the claimed 
invention, it is sufficiently specific. See Standard Oil Co. v. Montedison, S.p.a., 212 U.S.P.Q. 
327, 343 (3d Cir. 1981). The specificity requirement is met unless the asserted utility amounts to 
a "nebulous expression" such as "biological activity" or "biological properties" that does not 
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convey meaningful information about the utility of what is being claimed. Cross v. Iizuka, 753 
F.2d 1040, 1048 (Fed. Cir. 1985). 

In addition to conferring a specific benefit on the public, the benefit must also be 
"substantial." Brenner, 383 U.S. at 534. A "substantial" utility is a practical, "real-world" 
utility. Nelson v. Bowler, 626 F.2d 853, 856, 206 USPQ 881 (CCPA 1980). 

If persons of ordinary skill in the art would understand that there is a "well-established" 
utility for the claimed invention, the threshold is met automatically and the applicant need not 
make any showing to demonstrate utility. Manual of Patent Examining Procedure at § 706.03(a). 
Only if there is no "well-established" utility for the claimed invention must the applicant 
demonstrate the practical benefits of the invention. Id. 

Once the patent applicant identifies a specific utility, the claimed invention is presumed 
to possess it. In re Cortright, 165 F.3d 1353, 1357, 49 USPQ2d 1464 (Fed. Cir. 1999); In re 
Brana, 51 F.3d 1560, 1566; 34 USPQ2d 1436 (Fed. Cir. 1995). In that case, the Patent Office 
bears the burden of demonstrating that a person of ordinary skill in the art would reasonably 
doubt that the asserted utility could be achieved by the claimed invention. Id. To do so, the 
Patent Office must provide evidence or sound scientific reasoning. See In re Langer, 503 F.2d 
1380, 1391-92, 183 USPQ 288 (CCPA 1974). If and only if the Patent Office makes such a 
showing, the burden shifts to the applicant to provide rebuttal evidence that would convince the 
person of ordinary skill that there is sufficient proof of utility. Brana, 51 F.3d at 1566. The 
applicant need only prove a "substantial likelihood" of utility; certainty is not required. Brenner, 
383 U.S. at 532. 

II. Uses of the claimed polynucleotide for diagnosis of conditions and disorders 

characterized by expression of NHRP, for toxicology testing, and for drug discovery 
are sufficient utilities under 35 U.S.C. §§ 101 and 112, first paragraph 

The claimed invention meets all of the necessary requirements for establishing a credible 
utility under the Patent Law: There are "well-established" uses for the claimed invention known 
to persons of ordinary skill in the art, and there are specific practical and beneficial uses for the 
invention disclosed in the patent application's specification. These uses are explained, in detail, 
in the previously submitted First Bedilion Declaration, and in the Rockett Declaration, Iyer 
Declaration, and Second Bedilion Declaration accompanying this brief. Objective evidence, not 
considered by the Patent Office, further corroborates the credibility of the asserted utilities. 
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A. The use of the claimed polynucleotide for toxicology testing, drug discovery, 
and disease diagnosis are practical uses that confer "specific benefits" to the public 

The claimed invention has specific, substantial, real-world utility by virtue of its use in 
toxicology testing, drug development and disease diagnosis through gene expression profiling. 
These uses are explained in detail in the previously submitted First Bedilion Declaration, and in 
the accompanying Rockett Declaration, Iyer Declaration, and Second Bedilion Declaration. The 
claimed invention is a useful tool in cDNA microarrays used to perform gene expression 
analysis. That is sufficient to establish utility for the claimed polynucleotide. 

The instant application is a continuation application of and claims priority to United 
States patent application Serial No. 08/870,870 filed on June 6, 1997 (hereinafter "the Lai '870 
application"), having essentially the identical specification, with the exception of corrected 
typographical errors and reformatting changes. Thus page and line numbers may not match as 
between the Lai '506 application and the Lai '870 application. 

In his First Declaration, Dr. Bedilion explains the many reasons why a person skilled in 
the art reading the Lai '870 application on June 6, 1997 would have understood that application 
to disclose the claimed polynucleotide to be useful for a number of gene expression monitoring 
applications, e.g., as a highly specific probe for the expression of that specific polynucleotide in 
connection with the development of drugs and the monitoring of the activity of such drugs. (First 
Bedilion Declaration at, e.g., ffl 10-15). Much, but not all, of Dr. Bedilion's explanation 
concerns the use of the claimed polynucleotide in cDNA microarrays of the type first developed 
at Stanford University for evaluating the efficacy and toxicity of drugs, as well as for other 
applications. (First Bedilion Declaration, ff 12 and 15). 1 

In connection with his explanations, Dr. Bedilion states that the "Lai '870 application 
would have led a person skilled in the art on June 6, 1997 who was using gene expression 
monitoring in connection with working on developing new drugs for the treatment of immune 
responses and cancers to conclude that a cDNA microarray that contained the SEQ ID NO:74 
polynucleotide would be a highly useful tool and to request specifically that any cDNA 
microarray that was being used for such purposes contain the SEQ ID NO:74 polynucleotide." 
(First Bedilion Declaration, % 15 ). For example, as explained by Dr. Bedilion, "[p]ersons skilled 
in the art would [have appreciated on June 6, 1997] that cDNA microarrays that contained the 

1 Dr. Bedilion also explained, for example, why persons skilled in the art would also appreciate, based on the Lai 
'870 specification, that the claimed polynucleotide would be useful in connection with developing new drugs using 
technology, such as northern analysis, that predated by many years the development of the cDNA technology (First 
Bedilion Declaration, <j[ 16). 
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SEQ ID NO:74 polynucleotide would be a more useful tool than cDNA microarrays that did not 
contain the SEQ ID NO:74 polynucleotide in connection with conducting gene expression 
monitoring studies on proposed (or actual) drugs for treating immune responses and cancers for 
such purposes as evaluating their efficacy and toxicity." Id. 

In support of those statements, Dr. Bedilion provided detailed explanations of how cDNA 
technology can be used to conduct gene expression monitoring evaluations, with extensive 
citations to pre-June 6, 1997 publications showing the state of the art on June 6, 1997. (First 
Bedilion Declaration, flfl 10-14). While Dr. Bedilion's explanations in paragraph 15 of his 
Declaration include almost three pages of text and six subparts (a)-(f), he specifically states that 
his explanations are not "all-inclusive." Id. For example, with respect to toxicity evaluations, 
Dr. Bedilion had earlier explained how persons skilled in the art who were working on drug 
development on June 6, 1997 (and for several years prior to June 6, 1997) "without any doubt" 
appreciated that the toxicity (or lack of toxicity) of any proposed drug was "one of the most 
important criteria to be considered and evaluated in connection with the development of the 
drug" and how the teachings of the Lai '870 application clearly include using differential gene 
expression analyses in toxicity studies (First Bedilion Declaration, \ 10). 

Thus, the First Bedilion Declaration establishes that persons skilled in the art reading the 
Lai '870 application at the time it was filed "would have wanted their cDNA microarray to have 
a [SEQ ID NO:74 polynucleotide probe] because a microarray that contained such a probe (as 
compared to one that did not) would provide more useful results in the kind of gene expression 
monitoring studies using cDNA microarrays that persons skilled in the art have been doing since 
well prior to June 6, 1997." (First Bedilion Declaration, \ 15, item (f).) This, by itself, provides 
more than sufficient reason to compel the conclusion that the Lai '870 application disclosed to 
persons skilled in the art at the time of its filing substantial, specific and credible real-world 
utilities for the claimed polynucleotide. 

In his Declaration, Dr. Rockett explains the many reasons why a person skilled in the art 
in 1997 would have understood that any expressed polynucleotide is useful for a number of gene 
expression monitoring applications, e.g., in cDNA microarrays, in connection with the 
development of drugs and the monitoring of the activity of such drugs. (Rockett Declaration at, 
e.g., n 10-18). 

It is my opinion, therefore, based on the state of the art in toxicology at 
least since the mid-1990s . . . that disclosure of the sequence of a new gene or 
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protein, with or without knowledge of its biological function, would have been 
sufficient information for a toxicologist to use the gene and/or protein in 
expression profiling studies in toxicology. [Rockett Declaration, f 18.] 2 

In his Second Declaration, Dr. Bedilion explains why a person of skill in the art in [year 
of filing] would have understood that any expressed polynucleotide is useful for gene expression 
monitoring applications using cDNA microarrays. (Second Bedilion Declaration, e.g., f% 4-7.) 
In his Declaration, Dr. Iyer explains why a person of skill in the art in 1997 would have 
understood that any expressed polynucleotide is useful for gene expression monitoring 
applications using cDNA microarrays, stating that "[t]o provide maximum versatility as a 
research tool, the microarray should include - and as a biologist I would want my microarray to 
include - each newly identified gene as a probe." (Iyer Declaration, f 9.) 

In addition, Dr. Rockett explains in his Declaration that "there are a number of other 
differential expression analysis technologies that precede the development of microarrays, some 
by decades, and that have been applied to drug metabolism and toxicology research, including: 
(1) differential screening; (2) subtractive hybridization, including variants such as chemical 
cross-linking subtraction, suppression-PCR subtractive hybridization and representational 
difference analysis; (3) differential display; (4) restriction endonuclease facilitated analyses, 
including serial analysis of gene expression (SAGE) and gene expression fingerprinting and (5) 
EST analysis." (Rockett Declaration, «j[ 7.) 

Nowhere does the Patent Examiner address the fact that, as described on e.g., at pages 14, 
lines 21-23, page 56, lines 15-19, page 58, line 8 through page 59, line 28, and page 67, line 21 
through page 68, line 14 of the Lai '506 application, the claimed polynucleotide can be used as a 
highly specific probe in, for example, cDNA microarrays - a probe that without question can be 
used to measure both the existence and amount of complementary RNA sequences known to be 
the expression products of the claimed polynucleotide. The claimed invention is not, in that 
regard, some random sequence whose value as a probe is speculative or would require further 
research to determine. 

Given the fact that the claimed polynucleotide is known to be expressed, its utility as a 
measuring and analyzing instrument for expression levels is as indisputable as a scale's utility for 
measuring weight. This use as a measuring tool, regardless of how the expression level data 
ultimately would be used by a person of ordinary skill in the art, by itself demonstrates that the 

2 "Use of the words 'it is my opinion' to preface what someone of ordinary skill in the art would have known does 
not transform the factual statements contained in the declaration into opinion testimony." In re Alton, 37 USPQ2d 
1578, 1583 (Fed. Cir. 1996). 
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claimed invention provides an identifiable, real-world benefit that meets the utility requirement. 
Raytheon v. Roper, 724 F.2d 951, (Fed. Cir. 1983) (claimed invention need only meet one of its 
stated objectives to be useful); In re Cortwright, 165 F.3d 1353, 1359 (Fed. Cir. 1999) (how the 
invention works is irrelevant to utility); MPEP § 2107 ("Many research tools such as gas 
chromatographs, screening assays, and nucleotide sequencing techniques have a clear, specific, 
and unquestionable utility (e.g., they are useful in analyzing compounds) " (emphasis added)). 

The First Bedilion Declaration shows that a number of pre- June 6, 1997 publications 
confirm and further establish the utility of cDNA microarrays in a wide range of drug 
development gene expression monitoring applications at the time the Lai £ 870 application was 
filed (First Bedilion Declaration fj[ 10-14; First Bedilion Exhibits A-G). Indeed, Brown and 
Shalon U.S. Patent No. 5,807,522 (the Brown '522 patent, First Bedilion Exhibit D), which 
issued from a patent application filed in June 1995 and was effectively published on December 
29, 1995 as a result of the publication of a PCT counterpart application, shows that the Patent 
Office recognizes the patentable utility of the cDNA technology developed in the early to mid- 
1990s. As explained by Dr. Bedilion, among other things (First Bedilion Declaration, % 12): 

The Brown '522 patent further teaches that the "[m]icroarrays of 
immobilized nucleic acid sequences prepared in accordance with the invention" 
can be used in "numerous" genetic applications, including "monitoring of gene 
expression" applications (see Bedilion Tab D at col. 14, lines 36-42). The Brown 
'522 patent teaches (a) monitoring gene expression (i) in different tissue types, (ii) 
in different disease states, and (iii) in response to different drugs, and (b) that 
arrays disclosed therein may be used in toxicology studies (see First Bedilion Tab 
D at col. 15, lines 13-18 and 52-58; and col. 18, lines 25-30). 

Literature reviews published after the filing of the priority Lai '870 application describing 

the state of the art further confirm the claimed invention's utility. Rockett et al. confirm, for 

example, that the claimed invention is useful for differential expression analysis regardless of 

how expression is regulated: 

Despite the development of multiple technological advances which have 
recently brought the field of gene expression profiling to the forefront of 
molecular analysis, recognition of the importance of differential gene expression 
and characterization of differentially expressed genes has existed for many years. 

* * * 

Although differential expression technologies are applicable to a broad 
range of models, perhaps their most important advantage is that, in most cases, 
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absolutely no prior knowledge of the specific genes which are up- or down- 
regulated is required. 

* * * 

Whereas it would be informative to know the identity and functionality of 
all genes up/down regulated by . . . toxicants, this would appear a longer term 
goal .... However, the current use of gene profiling yields a pattern of gene 
changes for a xenobiotic of unknown toxicity which may be matched to that of 
well characterized toxins, thus alerting the toxicologist to possible in vivo 
similarities between the unknown and the standard, thereby providing a platform 
for more extensive toxicological examination, (emphasis in original) 

Rockett et al., Differential gene expression in drug metabolism and toxicology: 
practicalities, problems and potential , Xenobiotica 29:655-691 (July 1999) (Rockett Declaration, 
Exhibit C). 

In another post-June 6, 1997 article, Lashkari et al. state explicitly that sequences that are 

merely "predicted" to be expressed (predicted Open Reading Frames, or ORFs) - the claimed 

invention in fact is known to be expressed - have numerous uses: 

Efforts have been directed toward the amplification of each predicted ORF 
or any other region of the genome ranging from a few base pairs to several 
kilobase pairs. There are many uses for these amplicons- they can be cloned into 
standard vectors or specialized expression vectors, or can be cloned into other 
specialized vectors such as those used for two-hybrid analysis. The amplicons 
can also be used directly by, for example, arraying onto glass for expression 
analysis , for DNA binding assays, or for any direct DNA assay, (emphasis added) 

Lashkari et al., Whole genome analysis: Experimental access to all genome sequenced 
segments through larger-scale efficient oligonucleotide synthesis and PCR , Proc. Nat. Acad. Sci. 
U.S.A. 94:8945-8947 (Aug. 1997) (Reference No. 11). 

B. The use of polynucleotides coding for polypeptides expressed by humans as 
tools for toxicology testing, drug discovery, and the diagnosis of disease is now 
"well-established" 

The technologies made possible by expression profiling and the DNA tools upon which 
they rely are now well-established. The technical literature recognizes not only the prevalence of 
these technologies, but also their unprecedented advantages in drug development, testing and 
safety assessment. These technologies include toxicology testing, e.g., as described by Bedilion, 
Rockett, and Iyer in their Declarations. 
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Toxicology testing is now standard practice in the pharmaceutical industry. See, e.g., 

John C. Rockett et al., supra: 

Knowledge of toxin-dependent regulation in target tissues is not solely an 
academic pursuit as much interest has been generated in the pharmaceutical 
industry to harness this technology in the early identification of toxic drug 
candidates, thereby shortening the developmental process and contributing 
substantially to the safety assessment of new drugs. (Rockett Declaration, Exhibit 
C, page 656) 

To the same effect are several other scientific publications, including Emile F. Nuwaysir 
et al., Microarrays and toxicology: The advent of toxicogenomics , Molecular Carcinogenesis 
24:153-159 (1999) (Reference No. 12); Sandra Steiner and N. Leigh Anderson, Expression 
profiling in toxicology -- potentials and limitations, Toxicology Letters 112-13:467-471 (2000) 
(Reference No. 13). 

Nucleic acids useful for measuring the expression of whole classes of genes are routinely 

incorporated for use in toxicology testing. Nuwaysir et al. describes, for example, a Human 

ToxChip comprising 2089 human clones, which were selected 

for their well-documented involvement in basic cellular processes as well 
as their responses to different types of toxic insult. Included on this list are DNA 
replication and repair genes, apoptosis genes, and genes responsive to PAHs and 
dioxin-like compounds, peroxisome proliferators, estrogenic compounds, and 
oxidant stress. Some of the other categories of genes include transcription factors, 
oncogenes, tumor suppressor genes, cyclins, kinases, phosphatases, cell adhesion 
and motility genes, and homeobox genes. Also included in this group are 84 
housekeeping genes, whose hybridization intensity is averaged and used for signal 
normalization of the other genes on the chip. 

See also Table 1 of Nuwaysir et al. (listing additional classes of genes deemed to be of 
special interest in making a human toxicology microarray). 

The more genes that are available for use in toxicology testing, the more powerful the 
technique. "Arrays are at their most powerful when they contain the entire genome of the 
species they are being used to study." John C. Rockett and David J. Dix, Application of DNA 
arrays to toxicology . Environ. Health Perspec. 107:68 1-685 (1999) (Reference No. 14). Control 
genes are carefully selected for their stability across^aiarge set of array experiments in order to 
best study the effect of toxicological compounds. See attached email from the primary 
investigator on the Nuwaysir paper, Dr. Cynthia Afshari, to an Incyte employee, dated July 3, 
2000, as well as the original message to which she was responding (Reference No. 15), 
indicating that even the expression of carefully selected control genes can be altered. Thus, there 
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is no expressed gene which is irrelevant to screening for toxicological effects, and all expressed 
genes have a utility for toxicological screening. 

Further evidence of the well-established utility of all expressed polypeptides and 
polynucleotides in toxicology testing is found in U.S. Pat. No. 5,569,588 (Reference No. 5) and 
published PCT applications WO 95/21944 (Reference No. 1), WO 95/20681 (Reference No. 2), 
and WO 97/13877 (Reference No. 7). 

WO 95/21944 ("Differentially expressed genes in healthy and diseased subjects"), 

published August 17, 1995, describes the use of microarrays in expression profiling analyses, 

emphasizing that patterns of expression can be used to distinguish healthy tissues from diseased 

tissues and that patterns of expression can additionally be used in drug development and 

toxicology studies, without knowledge of the biological function of the encoded gene product. 

In particular, and with emphasis added: 

The present invention involves . . . methods for diagnosing diseases . . . 
characterized by the presence of [differentially expressed] . . . genes, despite the 
absence of knowledge about the gene or its function . The methods involve the 
use of a composition suitable for use in hybridization which consists of a solid 
surface on which is immobilized at pre-defined regions thereon a plurality of 
defined oligonucleotide/ polynucleotide sequences for hybridization. Each 
sequence comprises a fragment of an EST . . . . Differences in hybridization 
patterns produced through use of this composition and the specified methods 
enable diagnosis of diseases based on differential expression of genes of unknown 
function .... [abstract] 

The method [of the present invention] involves producing and comparing 
hybridization patterns formed between samples of expressed mRNA or cDNA 
polynucleotide sequences . . . and a defined set of 

oligonucleotide/polynucleotide[] . . . immobilized on a support. Those defined 
[immobilized] oligonucleotide/polynucleotide sequences are representative of the 
total expressed genetic component of the cells , tissues, organs or organism as 
defined by the collection of partial cDNA sequences (ESTs). [page 2] 

The present invention meets the unfilled needs in the art by providing 
methods for the . . . use of gene fragments and genes, even those of unknown full 
length sequence and unknown function, which are differentially expressed in a 
healthy animal and in an animal having a specific disease or infection by use of 
ESTs derived from DNA libraries of healthy and/or diseased/infected animals, 
[page 4] 

Yet another aspect of the invention is that it provides ... a means for . . . 
monitoring the efficacy of disease treatment regimes including . . . toxicological 
effects thereof ." [page 4] 
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It has been appreciated that one or more differentially identified EST or 
gene-specific oligonucleotide/polynucleotides define a pattern of differentially 
expressed genes diagnostic of a predisease, disease or infective state. A 
knowledge of the specific biological function of the EST is not required only that 
the EST[] identifies a gene or genes whose altered expression is associated 
reproducibly with the predisease, disease or infectious state, [page 4] 

As used herein, the term 'disease' or 'disease state' refers to any condition 
which deviates from a normal or standardized healthy state in an organism of the 
same species in terms of differential expression of the organism's genes. . . 
[whether] of genetic or environmental origin, for example, an inherited disorder 
such as certain breast cancers. . . .[or] administration of a drug or exposure of the 
animal to another agent, e.g., nutrition, which affects gene expression, [page 5] 

As used herein, the term 'solid support' refers to any known substrate 
which is useful for the immobilization of large numbers of 
oligonucleotide/polynucleotide sequences by any available method . . . [and 
includes, inter alia,] nitrocellulose, . . . glass, silica. . . . [page 6] 

By 'EST' or 'Expressed Sequence Tag' is meant a partial DNA or cDNA 
sequence of about 150 to 500, more preferably about 300, sequential nucleotides. . 
• • [page 6] 

One or more libraries made from a single tissue type typically provide at 
least about 3000 different (i.e., unique) ESTs and potentially the full complement 
of all possible ESTs representing all cDNAs e.g., 50,000 -100,000 in an animal 
such as a human , [page 7] 

The lengths of the defined oligonucleotide/ polynucleotides may be 
readily increased or decreased as desired or needed. . . . The length is generally 
guided by the principle that it should be of sufficient length to insure that it is on[] 
average only represented once in the population to be examined, [page 7] 

Comparing the . . . hybridization patterns permits detection of those 
defined oligonucleotide/ polynucleotides which are differentially expressed 
between the healthy control and the disease sample by the presence of differences 
in the hybridization patterns at pre-defined regions [of the solid support], [page 
13] 

It should be appreciated that one does not have to be restricted in using 
ESTs from a particular tissue from which probe RNA or cDNA is obtained[;] 
rather any or all ESTs (known or unknown) may be placed on the support. 
Hybridization will be used [to] form diagnostic patterns or to identify which 
particular EST is detected. For example, all known ESTs from an organism are 
used to produce a 'master' solid support to which control sample and disease 
samples are alternately hybridized, [page 14] 



Diagnosis is accomplished by comparing the two hybridization patterns , 
wherein substantial differences between the first and second hybridization 
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patterns indicate the presence of the selected disease or infection in the animal 
being tested. Substantially similar first and second hybridization patterns indicate 
the absence of disease or infection. This[,] like many of the foregoing 
embodiments [,] may use known or unknown ESTs derived from many libraries, 
[page 18] 

Still another intriguing use of this method is in the area of monitoring the 
effects of drugs on gene expression , both in laboratories and during clinical trials 
with animal[s], especially humans, [page 18] 

WO 95/20681 ("Comparative Gene Transcript Analysis"), filed in 1994 by Appellants' 
assignee and published August 3, 1995, has three issued U.S. counterparts: U.S. Pat. Nos. 
5,840,484, issued November 24, 1998; 6,114,114, issued September 5, 2000; and 6,303,297, 
issued October 16, 2001. 

The specification describes the use of transcript expression patterns, or "images", each 

comprising multiple pixels of gene-specific information, for diagnosis, for cellular phenotyping, 

and in toxicology and drug development efforts. The specification describes a plurality of 

methods for obtaining the requisite expression data ~ one of which is microarray hybridization — 

and equates the uses of the expression data from these disparate platforms. In particular, and 

with emphasis added: 

The invention provides a "method and system for quantifying the relative 
abundance of gene transcripts in a biological specimen. . . . [G]ene transcript 
imaging can be used to detect or diagnose a particular biological state, disease, or 
condition which is correlated to the relative abundance of gene transcripts in a 
given cell or population of cells. The invention provides a method for comparing 
the gene transcript image analysis from two or more different biological 
specimens in order to distinguish between the two specimens and identify one or 
more genes which are differentially expressed between the two specimens, 
[abstract] 

TWIe see each individual gene product as a 'pixel* of information, which 
relates to the expression of that, and only that, gene. We teach herein [] methods 
whereby the individual 'pixels' of gene expression information can be combined 
into a single gene transcript 'image/ in which each of the individual genes can be 
visualized simultaneously and allowing relationships between the gene pixels to 
be easily visualized and understood, [page 2] 

The present invention avoids the drawbacks of the prior art by providing a 
method to quantify the relative abundance of multiple gene transcripts in a given 
biological specimen. . . . The method of the instant invention provides for detailed 
diagnostic comparisons of cell profiles revealing numerous changes in the 
expression of individual transcripts, [page 6] 
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High resolution analysis of gene expression be used directly as a 
diagnostic profile. . . . [page 7] 

The method is particularly powerful when more than 100 and preferably 
more than 1,000 gene transcripts are analyzed, [page 7] 

The invention . . . includes a method of comparing specimens containing 
gene transcripts, [page 7] 

The final data values from the first specimen and the further identified 
sequence values from the second specimen are processed to generate ratios of 
transcript sequences, which indicate the differences in the number of gene 
transcripts between the two specimens, [i.e., the results yield analogous data to 
microarrays] [page 8] 

Also disclosed is a method of producing a gene transcript image analysis 
by first obtaining a mixture of mRNA, from which cDNA copies are made, [page 
8] 

In a further embodiment, the relative abundance of the gene transcripts in 
one cell type or tissue is compared with the relative abundance of gene transcript 
numbers in a second cell type or tissue in order to identify the differences and 
similarities, [page 9] 

In essence, the invention is a method and system for quantifying the 
relative abundance of gene transcripts in a biological specimen. The invention 
provides a method for comparing the gene transcript image from two or more 
different biological specimens in order to distinguish between the two specimens. 
• • • [page 9] 

[T]wo or more gene transcript images can be compared and used to detect 
or diagnose a particular biological state, disease, or condition which is correlated 
to the relative abundance of gene transcripts in a given cell or population of cells, 
[pages 9-10] 

The present invention provides a method to compare the relative 
abundance of gene transcripts in different biological specimens. . . . This process 
is denoted herein as gene transcript imaging. The quantitative analysis of the 
relative abundance for a set of gene transcripts is denoted herein as 'gene 
transcript image analysis' or 'gene transcript frequency analysis' . The present 
invention allows one to obtain a profile for gene transcription in any given 
population of cells or tissue from any type of organism , [page 11] 

The invention has significant advantages in the fields of diagnostics, 
toxicology and pharmacology , to name a few. [page 12] 

[G]ene transcript sequence abundances are compared against reference 
database sequence abundances including normal data sets for diseased and healthy 
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patients. The patient has the disease(s) with which the patient's data set most 
closely correlates , [page 12] 

For example, gene transcript frequency analysis can be used to 
differentiate normal cells or tissues from diseased cells or tissues. . . . [page 12] 

In toxicology , . . . [g]ene transcript imaging provides highly detailed 
information on the cell and tissue environment, some of which would not be 
obvious in conventional, less detailed screening methods. The gene transcript 
image is a more powerful method to predict drug toxicity and efficacy. Similar 
benefits accrue in the use of this tool in pharmacology. . . . [page 12] 

In an alternative embodiment , comparative gene transcript frequency 
analysis is used to differentiate between cancer cells which respond to anti-cancer 
agents and those which do not respond, [page 12] 

In a further embodiment, comparative gene transcript frequency analysis is 
used ... for the selection of better pharmacologic animal models." [page 14] 

In a further embodiment, comparative gene transcript frequency analysis is 
used in a clinical setting to give a highly detailed gene transcript profile of a 
diseased state or condition, [page 14] 

An alternate method of producing a gene transcript image includes the 
steps of obtaining a mixture of test mRNA and providing a representative array of 
unique probes whose sequences are complementary to at least some of the test 
mRNAs. Next, a fixed amount of the test mRNA is added to the arrayed probes. 
The test mRNA is incubated with the probes for a sufficient time to allow hybrids 
of the test mRNA and probes to form. The mRNA-probe hybrids are detected and 
the quantity determined, [page 15] 

[Tlhis research tool provides a way to get new drugs to the public faster 
and more economically." [page 36] 

In this method, the particular physiologic function of the protein transcript 
need not be determined to qualify the gene transcript as a clinical marker, [page 
38] 

[T]he gene transcript changes noted in the earlier rat toxicity study are 
carefully evaluated as clinical markers in the followed patients. Changes in the 
gene transcript image analyses are evaluated as indicators of toxicity by 
correlation with clinical signs and symptoms and other laboratory results. . . . The 
. . . analysis highlights any toxicological changes in the treated patients, [page 39] 



U.S. Pat. No. 5,569,588 ("Methods for Drug Screening") ("the '588 patent"), issued 
October 29, 1996, with a priority date of August 1995, describes an expression profiling 
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platform, the "genome reporter matrix", which is different from nucleic acid microarrays. 

Additionally describing use of nucleic acid microarrays, the '588 patent makes clear that the 

utility of comparing multidimensional expression datasets is independent of the methods by 

which such profiles are obtained. The ( 588 patent speaks clearly to the usefulness of such 

expression analyses in drug development and toxicology, particularly pointing out that a gene's 

failure to change in expression level is a useful result. Thus, with emphasis added, 

The invention provides u [m]ethods and compositions for modeling the 
transcriptional responsiveness of an organism to a candidate drug. . . . [The final 
step of the method comprises] comparing reporter gene product signals for each 
cell before and after contacting the cell with the candidate drug to obtain a drug 
response profile which provides a model of the transcriptional responsiveness of 
said organism to the candidate drug." [abstract] 

The present invention exploits the recent advances in genome science to 
provide for the rapid screening of large numbers of compounds against a systemic 
target comprising substantially all targets in a pathway [or] organism , [col. 1] 

The ensemble of reporting cells comprises as comprehensive a collection 
of transcription regulatory genetic elements as is conveniently available for the 
targeted organism so as to most accurately model the systemic transcriptional 
response. Suitable ensembles generally comprise thousands of individually 
reporting elements; preferred ensembles are substantially comprehensive, i.e. 
provide a transcriptional response diversity comparable to that of the target 
organism. Generally, a substantially comprehensive ensemble requires 
transcription regulatory genetic elements from at least a majority of the 
organism's genes, and preferably includes those of all or nearly all of the genes. 
We term such a substantially comprehensive ensemble a genome reporter matrix, 
[col. 2] 

Drugs often have side effects that are in part due to the lack of target 
specificity. . . . [A] genome reporter matrix reveals the spectrum of other genes in 
the genome also affected by the compound. In considering two different 
compounds both of which induce the ERG 10 reporter, if one compound affects 
the expression of 5 other reporters and a second compound affects the expression 
of 50 other reports, the first compound is, a priori, more likely to have fewer side 
effects, [cols. 2-3] 

Furthermore, it is not necessary to know the identity of any of the 
responding genes , [col. 3] 

[A]ny new compound that induces the same response profile as [a] . . . 
dominant tubulin mutant would provide a candidate for a taxol-like 
pharmaceutical, [col. 4] 



The genome reporter matrix offers a simple solution to recognizing new 
specificities in combinatorial libraries. Specifically, pools of new compounds are 
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tested as mixtures across the matrix. If the pool has any new activity not present 
in the original lead compound, new genes are affected among the reporters, [col. 
4] 

A sufficient number of different recombinant cells are included to provide 
an ensemble of transcriptional regulatory elements of said organism sufficient to 
model the transcriptional responsiveness of said organism to a drug. In a preferred 
embodiment, the matrix is substantially comprehensive for the selected regulatory 
elements, e.g. essentially all of the gene promoters of the targeted organism are 
included, [cols. 6-7] 

In a preferred embodiment, the basal response profiles are determined. . . . 
The resultant electrical output signals are stored in a computer memory as 
genome reporter output signal matrix data structure associating each output signal 
with the coordinates of the corresponding microtiter plate well and the stimulus or 
drug. This information is indexed against the matrix to form reference response 
profiles that are used to determine the response of each reporter to any milieu in 
which a stimulus may be provided. After establishing a basal response profile for 
the matrix, each cell is contacted with a candidate drug. The term drug is used 
loosely to refer to agents which can provoke a specific cellular response. . . . The 
drug induces a complex response pattern of repression, silence and induction 
across the matrix . . . .The response profile reflects the cell's transcriptional 
adjustments to maintain homeostasis in the presence of the drug. . . . After 
contacting the cells with the candidate drug, the reporter gene product signals 
from each of said cells is again measured to determine a stimulated response 
profile. The basal o[r] background response profile is then compared with ... the 
stimulated response profile to identify the cellular response profile to the 
candidate drug." [cols. 7-8] 

In another embodiment of the invention, a matrix [i.e., array] of 
hybridization probes corresponding to a predetermined population of genes of the 
selected organism is used to specifically detect changes in gene transcription 
which result from exposing the selected organism or cells thereof to a candidate 
drug. In this embodiment, one or more cells derived from the organism is 
exposed to the candidate drug in vivo or ex vivo under conditions wherein the 
drug effects a change in gene transcription in the cell to maintain homeostasis. 
Thereafter, the gene transcripts, primarily mRNA, of the cell or cells is isolated . . 
. [and] then contacted with an ordered matrix [array] of hybridization probes, each 
probe being specific for a different one of the transcripts, under conditions where 
each of the transcripts hybridizes with a corresponding one of the probes to form 
hybridization pairs. The ordered matrix of probes provides, in aggregate, 
complements for an ensemble of genes of the organism sufficient to model the 
transcriptional responsiveness of the organism to a drug. . . . The matrix-wide 
signal profile of the drug-stimulated cells is then compared with a matrix- wide 
signal profile of negative control cells to obtain a specific drug response profile. 
[col. 8] 
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The invention also provides means for computer-based qualitative analysis 
of candidate drugs and unknown compounds. A wide variety of reference 
response profiles may be generated and used in such analyses, [col. 8] 

Response profiles for an unknown stimulus (e.g. new chemicals, unknown 
compounds or unknown mixtures) may be analyzed by comparing the new 
stimulus response profiles with response profiles to known chemical stimuli, [col. 
9] 

The response profile of a new chemical stimulus may also be compared to 
a known genetic response profile for target gene(s). [col. 9] 

The August 11, 1997 press release from the '588 patent's assignee, Acacia Biosciences 
(now part of Merck) (Reference No. 8 attached hereto), and the September 15, 1997 news report 
by Glaser, "Strategies for Target Validation Streamline Evaluation of Leads," Genetic 
Engineering News (Reference No. 9 attached hereto), attest the commercial value of the methods 
and technology described and claimed in the '588 patent. 

WO 97/13877 ("Measurement of Gene Expression Profiles in Toxicity Determinations"), 

published April 17, 1997, describes an expression profiling technology differing somewhat from 

the use of cDNA microarrays and differing from the genome reporter matrix of the '588 patent; 

but the use of the data is analogous. As per its title, the reference describes use of expression 

profiling in toxicity determinations. In particular, and with emphasis added: 

[T]he invention relates to a method for detecting and monitoring changes 
in gene expression patterns in in vitro and in vivo systems for determining the 
toxicity of drug candidates. [Field of the invention] 

An object of the invention is to provide a new approach to toxicity 
assessment based on an examination of gene expression patterns, or profiles , in in 
vitro or in vivo test systems, [page 3] 

Another object of the invention is to provide a rapid and reliable method 
for correlating gene expression with short term and long term toxicity in test 
animals, [page 3] 

The invention achieves these and other objects by providing a method for 
massively parallel signature sequencing of genes expressed in one or more 
selected tissues of an organism exposed to a test compound. An important feature 
of the invention is the application of novel . . . methodologies that permit the 
formation of gene expression profiles for selected tissues .... Such profiles may 
be compared with those from tissues of control organisms at single or multiple 
time points to identify expression patterns predictive of toxicity , [page 3] 



As used herein, the terms 'gene expression profile,' and 'gene expression 
pattern' which is used equivalently, means a frequency distribution of sequences 
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of portions of cDNA molecules sampled from a population of tag-cDNA 
conjugates. . .. Preferably, the total number of sequences determined is at least 
1000; more preferably, the total number of sequences determined in a gene 
expression profile is at least ten thousand , [page 7] 

The invention provides a method for determining the toxicity of a 
compound by analyzing changes in the gene expression profiles in selected tissues 

of test organisms exposed to the compound Gene expression profiles 

derived from test organisms are compared to gene expression profiles derived 
from control organisms. . . . [page 7] 



Therefore, the potential benefit to the public, in terms of lives saved and reduced health 
care costs, are enormous. Evidence of the benefits of this information include: 

• In 1999, CV Therapeutics, an Incyte collaborator, was able to use Incyte gene 
expression technology, information about the structure of a known transporter 
gene, and chromosomal mapping location, to identify the key gene associated 
with Tangier disease. This discovery took place over a matter of only a few 
weeks, due to the power of these new genomics technologies. The discovery 
received an award from the American Heart Association as one of the top 10 
discoveries associated with heart disease research in 1999. 

• In an April 9, 2000, article published by the Bloomberg news service, an Incyte 
customer stated that it had reduced the time associated with target discovery and 
validation from 36 months to 18 months, through use of Incyte' s genomic 
information database. Other Incyte customers have privately reported similar 
experiences. The implications of this significant saving of time and expense for 
the number of drugs that may be developed and their cost are obvious. 

• In a February 10, 2000, article in the Wall Street Journal, one Incyte customer 
stated that over 50 percent of the drug targets in its current pipeline were derived 
from the Incyte database. Other Incyte customers have privately reported similar 
experiences. By doubling the number of targets available to pharmaceutical 
researchers, Incyte genomic information has demonstrably accelerated the 
development of new drugs. 

Because the Patent Examiner failed to address or consider the "well-established" utilities 
for the claimed invention in toxicology testing, drug development, and the diagnosis of disease, 
the Examiner's rejections should be overturned regardless of their merit. 

C. Objective evidence corroborates the utilities of the claimed invention 

There is, in fact, no restriction on the kinds of evidence a Patent Examiner may consider 
in determining whether a "real-world" utility exists. "Real-world" evidence, such as evidence 
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showing actual use or commercial success of the invention, can demonstrate conclusive proof of 
utility. Raytheon v. Roper, 220 USPQ2d 592 (Fed. Cir. 1983); Nestle v. Eugene, 55 F.2d 854, 
856, 12 USPQ 335 (6th Cir. 1932). Indeed, proof that the invention is made, used or sold by any 
person or entity other than the patentee is conclusive proof of utility. United States Steel Corp. 
v. Phillips Petroleum Co., 865 F.2d 1247, 1252, 9 USPQ2d 1461 (Fed. Cir. 1989). 

Over the past several years, a vibrant market has developed for databases containing the 
sequences of all expressed genes (along with the polypeptide translations of those genes), in 
particular genes having medical and pharmaceutical significance such as the instant sequence. 
(Note that the value in these databases is enhanced by their completeness, but each sequence in 
them is independently valuable.) The databases sold by Appellants' assignee, Incyte, include 
exactly the kinds of information made possible by the claimed invention, such as tissue and 
disease associations. Incyte sells its database containing the claimed sequence and millions of 
other sequences throughout the scientific community, including to pharmaceutical companies 
who use the information to develop new pharmaceuticals. 

Both Incyte's customers and the scientific community have acknowledged that Incyte's 
databases have proven to be valuable in, for example, the identification and development of drug 
candidates. Page et al., in discussing the identification and assignment of candidate drug targets, 
state that "rapid identification and assignment of candidate targets and markers represents a huge 
challenge ... [t]he process of annotation is similarly aided by the quantity and richness of the 
sequence specific databases that are currently available, both in the public domain and in the 
private sector (e.g. those supplied by Incyte Pharmaceuticals)" Page, MJ. et al., "Proteomics: a 
major new technology for the drug discovery process," Drug Discov. Today 4:55-62 (1999) 
(Reference No. 16), see page 58, col. 2). As Incyte adds information to its databases, including 
the information that can be generated only as a result of Incyte's invention of the claimed 
polynucleotide and its use of that polynucleotide on cDNA microarrays, the databases become 
even more powerful tools. Thus the claimed invention adds more than incremental benefit to the 
drug discovery and development process. 

Customers can, moreover, purchase the claimed polynucleotide directly from Incyte, 
saving the customer the time and expense of isolating and purifying or cloning the 
polynucleotide for research uses such as those described supra. 

III. The Patent Examiner's rejections are without merit 
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Rather than responding to the evidence demonstrating utility, the Examiner attempts to 
dismiss it altogether by arguing that the disclosed and well-established utilities for the claimed 
polynucleotide are not "specific, substantial, credible" utilities. (Final Office Action, page 2. 
The Examiner is incorrect both as a matter of law and as a matter of fact. 

A. The precise biological role or function of an expressed polynucleotide is not 
required to demonstrate utility 

The Patent Examiner's ejection of the claimed invention is based partly on the ground 
that, without information as to the precise "biological significance" (Final Office Action, page 2) 
of the claimed invention, the claimed invention's utility is not sufficiently specific. According to 
the Examiner, it is not enough that a person of ordinary skill in the art could use and, in fact, 
would want to use the claimed invention either by itself or in a cDNA microarray to monitor the 
expression of genes for such applications as the evaluation of a drug's efficacy and toxicity. The 
Examiner would require, in addition, that the applicant provide a specific and substantial 
interpretation of the results generated in any given expression analysis. 

It may be that specific and substantial interpretations and detailed information on 
biological function are necessary to satisfy the requirements for publication in some technical 
journals, but they are not necessary to satisfy the requirements for obtaining a United States 
patent. The relevant question is not, as the Examiner would have it, whether it is known how or 
why the invention works, In re Cortwright, 165 F.3d 1353, 1359 (Fed. Cir. 1999), but rather 
whether the invention provides an "identifiable benefit" in presently available form. Juicy Whip 
Inc. v. Orange Bang Inc., 185 F.3d 1364, 1366 (Fed. Cir. 1999). If the benefit exists, and there is 
a substantial likelihood the invention provides the benefit, it is useful. There can be no doubt, 
particularly in view of the First Bedilion Declaration (at, e.g., fj[ 10 and 15), that the present 
invention meets this test. 

The threshold for determining whether an invention produces an identifiable benefit is 
low. Juicy Whip, 185 F.3d at 1366. Only those utilities that are so nebulous that a person of 
ordinary skill in the art would not know how to achieve an identifiable benefit and, at least 
according to the PTO guidelines, so-called "throwaway" utilities that are not directed to a person 
of ordinary skill in the art at all, do not meet the statutory requirement of utility. Utility 
Examination Guidelines, 66 Fed. Reg. 1092 (Jan. 5, 2001). 
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Knowledge of the biological function or significance of a biological molecule has never 

been required to show real-world benefit. In its most recent explanation of its own utility 

guidelines, the PTO acknowledged as much (66 F.R. at 1095): 

[T]he utility of a claimed DNA does not necessarily depend on the 
function of the encoded gene product. A claimed DNA may have specific and 
substantial utility because, e.g., it hybridizes near a disease-associated gene or it 
has gene-regulating activity. 

By implicitly requiring knowledge of biological function for any claimed nucleic acid, 
the Examiner has, contrary to law, elevated what is at most an evidentiary factor into an absolute 
requirement of utility. Rather than looking to the biological role or function of the claimed 
invention, the Examiner should have looked first to the benefits it is alleged to provide. 

B. Membership in a class of useful products can be proof of utility 

Despite the uncontradicted evidence that the claimed polynucleotide encodes a 
polypeptide expressed by humans, the Examiner refused to impute the utility of the members of 
the family of expressed polypeptides to NHRP. 

In order to demonstrate utility by membership in a class, the law requires only that the 
class not contain a substantial number of useless members. So long as the class does not contain 
a substantial number of useless members, there is sufficient likelihood that the claimed invention 
will have utility, and a rejection under 35 U.S.C. § 101 is improper. That is true regardless of 
how the claimed invention ultimately is used and whether or not the members of the class 
possess one utility or many. See Brenner v. Manson, 383 U.S. 519, 532 (1966); Application of 
Kirk, 376 F.2d 936, 943 (CCPA 1967). 

Membership in a "general" class is insufficient to demonstrate utility only if the class 
contains a sufficient number of useless members such that a person of ordinary skill in the art 
could not impute utility by a substantial likelihood. There would be, in that case, a substantial 
likelihood that the claimed invention is one of the useless members of the class. In the few cases 
in which class membership did not prove utility by substantial likelihood, the classes did in fact 
include predominately useless members. E.g., Brenner (man-made steroids); Kirk (same); Natta 
(man-made polyethylene polymers). 

The Examiner addresses NHRP as if the general class in which it is included is not the 
family of expressed polypeptides, but rather all polynucleotides or all polypeptides, including the 
vast majority of useless theoretical molecules not occurring in nature, and thus not pre-selected 
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by nature to be useful. While these "general classes" may contain a substantial number of 
useless members, the family of expressed polypeptides does not. The family of expressed 
polypeptides is sufficiently specific to rule out any reasonable possibility that NHRP would not 
also be useful like the other members of the family. 

Because the Examiner has not presented any evidence that the family of expressed 
polypeptides has any, let alone a substantial number, of useless members, the Examiner must 
conclude that there is a "substantial likelihood" that the NHRP encoded by the claimed 
polynucleotide is useful. It follows that the claimed polynucleotide also is useful. 

C. Because the uses of the claimed polynucleotide in toxicology testing, drug 
discovery, and disease diagnosis are practical uses beyond mere study of the invention 
itself, the claimed invention has substantial utility 

As used in toxicology testing, drug discovery, and disease diagnosis, the claimed 

invention has a beneficial use in research other than studying the claimed invention or its protein 
products. It is a tool, rather than an object, of research. The data generated in gene expression 
monitoring using the claimed invention as a tool is not used merely to study the claimed 
polynucleotide itself, but rather to study properties of tissues, cells, and potential drug candidates 
and toxins. Without the claimed invention, the information regarding the properties of tissues, 
cells, drug candidates and toxins is less complete. [First Bedilion Declaration at f 15.] 

The use of the claimed invention as a research tool in toxicology testing is specific and 
substantial. While it is true that all polypeptides and polynucleotides expressed in humans have 
utility in toxicology testing based on the property of being expressed at some time in 
development or in the cell life cycle, this basis for utility does not preclude that utility from being 
specific and substantial. A toxicology test using any particular expressed polypeptide or 
polynucleotide is dependent on the identity of that polypeptide or polynucleotide, not on its 
biological function or its disease association. The results obtained from using any particular 
human-expressed polypeptide or polynucleotide in toxicology testing is specific to both the 
compound being tested and the polypeptide polynucleotide used in the test. No two human- 
expressed polypeptides or polynucleotides are interchangeable for toxicology testing because the 
effects on the expression of any two such polypeptides or polynucleotides will differ depending 
on the identity of the compound tested and the identities of the two polypeptides or 
polynucleotides. It is not necessary to know the biological functions and disease associations of 
the polypeptides or polynucleotides in order to carry out such toxicology tests. Therefore, at the 
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very least, the claimed polynucleotide is a specific control for toxicology tests in developing 
drugs targeted to other polypeptides or polynucleotides, and are clearly useful as such. 

As an example, any histone gene or protein expressed in humans can be used in a specific 
and substantial toxicology test in drug development. A histone gene or protein may not be 
suitable as a target for drug development because disruption of such a gene may kill a patient. 
However, a human-expressed histone gene or protein is surely an excellent subject for toxicology 
studies when developing drugs targeted to other genes or proteins. A drug candidate which alters 
expression of a histone gene or protein is toxic because disruption of such a pervasively- 
expressed gene or protein would have undesirable side effects in a patient. Therefore, when 
testing the toxicology of a drug candidate targeted to another gene or protein, measuring the 
expression of a histone gene or protein is a good measure of the toxicity of that candidate, 
particularly in in vitro cellular assays at an early stage of drug development. The utility of any 
particular human-expressed histone gene or protein in toxicology testing is specific and 
substantial because a toxicology test using that histone gene or protein cannot be replaced by a 
toxicology test using a different gene, including any other histone gene or protein. This specific 
and substantial utility requires no knowledge of the biological function or disease association of 
the histone gene or protein . 

The expression of the SEQ ID NO:74 polynucleotide in human tissues would lead a 
skilled artisan to believe that this polynucleotide has some physiological implications, even if 
these implications have not been precisely identified. During toxicology testing, a change in 
expression of a human-expressed polynucleotide indicates potential toxicity of a drug candidate, 
even if the physiological implications of that polynucleotide or of the polypeptide encoded by 
that polynucleotide are unknown. Such a toxicology test allows one to choose a lead drug 
candidate which has minimal effects on the expression of proteins other than the protein to which 
the candidate is targeted. Such a lead drug candidate would be less likely to have unintended 
side effects than a drug candidate having greater effects on the expression of genes/proteins other 
than the intended drug target. Thus, the benefit of such a toxicology test is an increased chance 
of finding a safe and effective drug, and a corresponding reduction in the expense and time of 
bringing a drug to market. 

The claimed invention has numerous additional uses as a research tool, each of which 
alone is a "substantial utility." These include diagnostic assays (e.g., pages 55-59) and 
chromosomal mapping (e.g., pages 59-60). 
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D. The Patent Examiner failed to demonstrate that a person of ordinary skill in 
the art would reasonably doubt the utility of the claimed invention 

The Examiner bases the utility rejection on two issues, that the utilities of the claimed 
polynucleotide in toxicology testing are "not specific to the claimed polynucleotides" and that 
"the results of gene expression monitoring assays would be meaningless without significant 
further research." (Final Office Action, page 5.) Appellants demonstrate below that the claimed 
uses meet the requirement that the claimed invention yield a "specific benefit" and why these 
uses constitute more than "further research" into the claimed invention itself. 

1. Biological function, differential expression, or disease association is 
irrelevant to utility 

The Examiner states that "[a]fter further research, a specific and substantial credible 
utility might be found for the claimed isolated polynucleotides" (Final Office Action, page 3.) 
The Examiner alleges that such a finding of utility would require demonstrated biological 
function, disease association, or differential expression of the claimed polynucleotide. (Final 
Office Action, e.g., pages 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 17, and 21.) The Examiner however 
continues to ignores other utilities discussed in the Specification and/or well known in the art, 
such as toxicology testing, alleging that "the results of gene expression monitoring assays would 
be meaningless without significant further research." (Final Office Action, page 5.) 

Appellants have demonstrated a utility for the claimed SEQ ID NO:74 polynucleotide 
and the encoded SEQ ID NO:37 polypeptide irrespective of whether or not a person would wish 
to perform additional experimentation on biological function, disease association, or differential 
expression as another utility. The fact that additional experimentation could be performed to 
determine the biological function, disease association, or differential expression of the claimed 
SEQ ID NO:74 polynucleotide and the encoded SEQ ID NO:37 polypeptide does not preclude, 
and is in fact irrelevant to, the actual utility of the invention. That utility exists today regardless 
of the biological function, disease association, or differential expression of the claimed SEQ ID 
NO:74 polynucleotide and the encoded SEQ ID NO:37 polypeptide. (See, e.g., Rockett 
Declaration, f 18 and Iyer Declaration, %9.) 

Monitoring the expression of the claimed polynucleotide or the polypeptide encoded by 

the claimed polynucleotide gives important information on the potential toxicity of a drug 

candidate that is specifically targeted to any other polypeptide, regardless of the biological 

function, disease association, or differential expression of the claimed polynucleotide or the 
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polypeptide encoded by the claimed polynucleotide. The claimed polynucleotide or the 
polypeptide encoded by the claimed polynucleotide is useful for measuring the toxicity of drug 
candidates specifically targeted to other polynucleotides or polypeptides regardless of any 
possible utility for measuring the properties of the claimed polynucleotide or the polypeptide 
encoded by the claimed polynucleotide. 

2. Use of the claimed polynucleotide in toxicology testing 

The Office Action does not find the Bedilion Declaration persuasive, alleging that 
"any new polynucleotide can be used in a microarray, and thus this asserted utility is not 
specific" and that "the specification does not disclose that NHRP is expressed in at altered levels 
or forms in tissues exhibiting a pathological state." (Final Office Action, page 4.) 

The Examiner's arguments amount to nothing more than the Examiner's disagreement 
with the Bedilion Declaration and the Appellants' assertions about the knowledge of a person of 
ordinary skill in the art, and is tantamount to the substitution of the Examiner's own judgment 
for that of the Appellants' expert. The Examiner must accept the Appellants' assertions to be 
true. The Examiner is, moreover, wrong on the facts because the Bedilion Declaration 
demonstrates how one of skill in the art, reading the specification at the time the parent Lai '870 
application was filed (June 6, 1997), would have understood that specification to disclose the use 
of the claimed polynucleotide in gene expression monitoring for toxicology testing, drug 
development, and the diagnosis of disease (See the First Bedilion Declaration at, e.g., ff 10-16). 

For example, monitoring the expression of the SEQ ID NO:74 polynucleotide is a method 
of testing the toxicology of drug candidates during the drug development process. Dr. Bedilion 
in his Declaration states that "good drugs are not only potent, they are specific. This means that 
they have strong effects on a specific biological target and minimal effects on all other biological 
targets." (First Bedilion Declaration % 10.) Thus, if the expression of a particular polynucleotide 
is affected in any way by exposure to a test compound, and if that particular polynucleotide is not 
the specific target of the test compound (e.g., if the test compound is a drug candidate), then the 
change in expression is an indication that the test compound may have undesirable toxic side 
effects. It is important to note that such an indication of possible toxicity is specific not only for 
each compound tested, but also for each and every individual polynucleotide whose expression is 
being monitored. 

However, the Examiner continues to view the utility in toxicology testing of the claimed 

polynucleotide as requiring knowledge of either the biological function or disease association or 
Doc No.l 18717 32 09/745,506 



Docket No.: PF-0300-3 CON 



differentia] expression of the claimed polynucleotide. The Examiner views toxicology testing as 

a process to measure the toxicity of a drug candidate only when that drug candidate is 

specifically targeted to the claimed polynucleotide. The Examiner has refused to consider that 

the claimed polynucleotide is useful for measuring the toxicity of drug candidates which are 

targeted not to the claimed polynucleotide, but to other polynucleotides. This utility of the 

claimed polynucleotide does not require any knowledge of the biological function or disease 

association or differential expression of the SEQ ID NO:37 polypeptide or SEQ ID NO:74 

polynucleotide and is a specific, substantial and credible utility. (See, e.g., Rockett Declaration, 

118 and Iyer Declaration, f9.) 

The Final Office Action emphasizes that "[s]ince any polynucleotide can be used in a 

microarray, such a use is not specific to the claimed polynucleotides" (Final Office Action, page 

5), however Appellants note that: 

To meet the utility requirement of sections 101 and 1 12 of the Patent Act, 
the patent applicant need only show that the claimed invention is "practically 
useful," Anderson v. Natta, 480 F.2d 1392, 1397, 178 USPQ 458 (CCPA 1973) 
and confers a "specific benefit" on the public. Brenner v. Manson, 383 U.S. 519, 
534-35, 148 USPQ 689 (1966). 

Practical real-world uses are not limited to uses that are unique to an invention. The law 
requires that the practical utility be "definite," not particular. Montedison, 664 F.2d at 375. 
Appellants are not aware of any court that has rejected an assertion of utility on the grounds that 
it is not "particular" or "unique" to the specific invention. 

3. Discussion of toxicology testing in the Specification 

The Examiner alleges that "the particulars of toxicology testing with the claimed 
polynucleotides are not disclosed in the instant specification." (Final Office Action, page 7.) 
Well-established utilities, such as toxicology testing by the use of cDNA microarrays, need not 
be explicitly disclosed in a patent application. Furthermore, the Examiner's position amounts to 
nothing more than the Examiner's disagreement with the First Bedilion Declaration (which 
purports therefore to substitute the Examiner's judgment for that of Appellants' expert) and 
Appellants' assertions about the knowledge of a person of ordinary skill. The Examiner must 
accept Appellants' assertions to be true. The Final Office Action fails to address the disclosure 
in the instant specification on gene and protein expression monitoring applications, as discussed 
below. 
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Support for the utility of the claimed polynucleotide in toxicology testing, as well as for 
utility in drug screening, may be found in the specification. For example, the parent Lai '870 
application discloses that the polynucleotide sequences disclosed therein, including the SEQ ED 
NO:74 polynucleotide, are useful as probes in microarrays. (Lai '870 application, page 58, line 
13 through page 60, line 4 and page 67, line 28 through page 68, line 21.) The Lai '870 
Specification teaches that microarrays can be used "to monitor the expression level of large 
numbers of genes simultaneously (to produce a transcript image)" for a number of purposes, 
including "in developing and in monitoring the activities of therapeutic agents" (Lai '870 
application at page 58, lines 14-18). 

4. Utility of all expressed polynucleotides in toxicology testing 

The Examiner argues that use of the "[s]ince any polynucleotide can be used in a 
microarray, such a use is not specific to the claimed polynucleotides." (Final Office Action, 
page 5.) The Examiner further alleged that "any orphan gene can be used in the microarrays 
described by Rockett et al. (Rockett Declaration, Exhibit C) and that therefore "[t]he asserted 
utility for the claimed polynucleotide is not specific to the claimed polynucleotide." (Final 
Office Action, page 6.) The Examiner doesn't point to any law, however, that says a utility that 
is shared by a large class is somehow not a utility. If all of the class of expressed 
polynucleotides can be so used, then they all have utility. The issue is, once again, whether the 
claimed polynucleotide and encoded polypeptide have any utility, not whether other compounds 
have a similar utility. Nothing in the law says that an invention must have a "unique" utility. 
Indeed, the whole notion of well-established utilities PRESUPPOSES that many different 
inventions can have the exact same utility (if the Examiner's argument were correct, there could 
never be a well-established utility, because you could always find a generic group with the same 
utility!). 

It is true that just about any expressed polynucleotide will have use as a toxicology 
control, but Appellants need not argue this for the purposes of this case. Appellants argue only 
that this particular claimed invention could be so used, and has provided e.g., the First Bedilion 
Declaration, the Rockett Declaration, and the Iyer Declaration to back this up. The point is not 
whether or not the claimed polynucleotide is, in any given toxicology test, differentially 
expressed. The point is that the invention provides a useful measuring stick regardless of 
whether there is or is not differential expression. That makes the invention useful today, in the 
real-world, for real purposes. 
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5. The Final Office Action on page 6 asserts that the Appellants have made a 

misplaced analogy by comparing the claimed polynucleotide to a scale. The Examiner asserts 

that a microarray is analogous to a scale while the claimed polynucleotide is analogous to "the 

object being weighed on the scale" which does not necessarily have patentable utility. The 

Examiner further asserts that "[i]t is true that a scale has patentable utility as a research tool" and 

that "microarray technology has patentable utility," but that "the microarray is not being claimed, 

but rather a polynucleotide that can be used in microarrays." (Final Office Action, page 6.) 

With respect to the utility of the claimed polynucleotide in toxicology testing, the Examiner is 

wrong. The claimed polynucleotide may be used as a probe on a microarray. In toxicology 

testing as described above, the claimed polynucleotide is not the object of the research. The 

claimed polynucleotide is a research tool used to assess the toxicity of drug candidates which are 

specifically targeted to other polynucleotides. It is the other polynucleotides and the drug 

candidates which are the object of the research. 

The Examiner further discounts the teaching in the Brown patent, cited by Bedilion in his 

Declaration, stating that "[t]he Brown patent claims methods of forming microarrays " (Final 

Office Action, page 6.) The Examiner ignores the teaching in the Brown patent that: 

In one application, an array of cDNA clones representing genes is 
hybridized with total cDNA from an organism to monitor gene expression for 
research or diagnostic purposes. . . This two-color experiment can be used to 
monitor gene expression in different tissue types, disease states, response to 
drugs, or response to environmental factors. (Brown, column 15, lines 5-7 and 
13-16.) 



In addition to the genetic applications listed above, arrays of whole cells, 
peptides, enzymes, antibodies, antigens, receptors, ligands, phospholipids, 
polymers, drug cogener preparations or chemical substances can be fabricated by 
the means described in this invention for large scale screening assays in medical 
diagnostics, drug discovery, molecular biology, immunology and toxicology. 
(Brown, column 15, lines 52-58.) 

6. In addition, the use of an expressed polynucleotide as a control in a 
toxicology test is a specific utility. It is irrelevant whether "[i]n this case, as indicated at the 
bottom of page 18 of the Brief [sic: Response], all nucleic acids and genes are in some 
combination useful in toxicology testing" (Final Office Action, page 7) or whether "the 
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technique describe by Rockett . . . can be preformed with any polynucleotides." (Final Office 
Action, page 8.) The Examiner implies that a utility is not specific if the process carried out in 
applying that utility to an object can also be carried out on a different object. This is incorrect. 
The fact that one can apply a given process to a number of different objects does not mean that 
the process is not a specific utility when applied to a particular object. In the present case, a 
toxicology test can be carried out using any polynucleotide expressed in humans as a control, 
providing that the polynucleotide is not the target of the toxicology test. In carrying out such a 
test, a particular process can be applied using any expressed polynucleotide. However, each 
toxicology test using a given expressed polynucleotide as a control is a distinct and unique 
toxicology test because the results of the test are dependent on the identity of the expressed 
polynucleotide. A toxicology test using a given expressed polynucleotide is not interchangeable 
with a toxicology test using a different expressed polynucleotide, even if the particular process 
used in carrying out the toxicology tests are identical. The fact that the same series of steps can 
be used to carry out such toxicology tests does not prevent such tests from being a specific 
utility. 

7. The Examiner contends that "use of the claimed polynucleotide in an array 

for toxicology screening is only useful in the sense that the information that is gained from the 

array is dependent on the pattern derived from the array, and says nothing with regard to each 

individual member of the array." (Final Office Action, page 7.) Appellants reiterate that the 

each individual claimed polynucleotide has utility, because with the addition of each expressed 

polynucleotide to the pool of genes available for use in gene expression technology, the more 

useful the gene expression technology (e.g., microarrays) is for toxicology testing. Each new 

gene available adds value to the set. The Examiner again ignores the teaching in the First 

Bedilion Declaration, which is tantamount to substituting the Examiner's own opinion for that of 

Appellants' expert. Dr. Bedilion, in his First Declaration, states that the "specification of the Lai 

'870 application would have led a person skilled in the art on June 6, 1997 who was using gene 

expression monitoring in connection with working on developing new drugs for the treatment of 

immune responses and cancers to conclude that a cDNA microarray that contained the SEQ ID 

NO:74 polynucleotide would be a highly useful tool and to request specifically that any cDNA 

microarray that was being used for such purposes contain the SEQ ID NO:74 polynucleotide." 

(First Bedilion Declaration, ^[ 15 ). For example, as explained by Dr. Bedilion, "[p]ersons skilled 

in the art would [have appreciated on June 6, 1997] that cDNA microarrays that contained the 
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SEQ ID NO:74 polynucleotide would be a more useful tool than cDNA microarrays that did not 
contain the SEQ ID NO:74 polynucleotide in connection with conducting gene expression 
monitoring studies on proposed (or actual) drugs for treating immune responses and cancers for 
such purposes as evaluating their efficacy and toxicity." Id. 

Furthermore, the claimed polynucleotide could be used in techniques that measure gene 
expression in non-microarray formats, such as northern analysis. (First Bedilion Declaration, f 
16.) 

8. The Examiner criticizes Appellants' citation of the commercial success of 
Incyte's databases as evidence of the commercial value of the contained information on the 
claimed polynucleotide. The Examiner argues that "many products which lack patentable utility 
enjoy commercial success, are actually used, and are considered valuable" including "silly fads 
such as pet rocks, but also. . . serious scientific products like orphan receptors." (Final Office 
Action, page 7.) Appellants note that there are at least two U.S. Patents claiming orphan 
receptors (U.S. Patent Nos. 5,958,710 and 6,277,976). 

9. The Examiner questions the utility of the claimed polynucleotide in 
toxicology testing, stating that "[n]either the toxic substances nor the susceptible organ systems 
are identified." (Final Office Action, page 7.) Appellants note that monitoring the expression of 
the claimed polynucleotide is a method of testing the toxicology of drug candidates during the 
drug development process. If the expression of a particular polynucleotide is affected in any way 
by exposure to a test compound, and if that particular polynucleotide (or its encoded 
polypeptide) is not the specific target of the test compound (e.g., if the test compound is a drug 
candidate), then the change in expression is an indication that the test compound may have 
undesirable toxic side effects that may limit its usefulness as a specific drug. Toxicology testing 
using microarrays reduces time needed for drug development by weeding out compounds which 
are not specific to the drug target. Learning this from an array in a gene expression monitoring 
experiment early in the drug development process costs less than learning this, for example, 
during Phase III clinical trials. It is important to note that such an indication of possible toxicity 
is specific not only for each compound tested, but also for each and every individual 
polynucleotide whose expression is being monitored. 



10. Appellants 9 Invention Has Specific Utility 
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The Examiner alleges that "[t]his asserted utility [in toxicology testing] is not specific to 
the claimed polynucleotides, as any DNA can be placed into the microarray in order to carry out 
further research into the expression of said DNA." (Final Office Action, page 5.) 

Appellants' submission of additional Declarations and references overcomes this 
concern. Those Declarations and references demonstrate that, far from applying regardless of 
the specific properties of the claimed invention, the utility of Appellants' claimed polynucleotide 
as a gene-specific probe depends upon specific properties of the polynucleotide, that is, its 
nucleic acid sequence. 

"[E]ach probe on ... [a "high density spotted microarray []"], with careful design and 
sufficient length, and with sufficiently stringent hybridization and wash conditions, binds 
specifically and with minimal cross-hybridization, to the probe's cognate transcript" (Rockett 
Declaration, f 10(i), emphasis added); "[e]ach gene included as a probe on a microarray provides 
a signal that is specific to the cognate transcript, at least to a first approximation." (Iyer 
Declaration, f 7, emphasis added.) 3 Accordingly, "each additional probe makes an additional 
transcript newly detectable by the microarray, increasing the detection range, and thus versatility, 
of this analytical device for gene expression profiling" (Rockett Declaration, f 10(ii)); equally, 
u [e]ach new gene-specific probe added to a microarray thus increases the number of genes 
detectable by the device, increasing the resolving power of the device." (Iyer Declaration, <][ 7.) 
Although not required for present purposes, it would be appropriate to state on the record here 
that the specificity of nucleic acid hybridization was well-established far earlier than the 
development of high density spotted microarrays in 1995, and indeed is the well-established 
underpinning of many, perhaps most, molecular biological techniques developed over the past 
30-40 years. 

11. The Examiner's reliance on Brenner v. Manson is misplaced 

This is not a case in which biological function is necessary to provide a link between the 
claimed invention on one hand, and a compound of known utility on the other. Given that the 
claimed invention is disclosed in the Lai '870 application to be useful as a tool in a number of 
gene expression monitoring applications that were well-known at the time of the filing of the 
application in connection with the development of drugs and the monitoring of the activity of 
drugs, the precise biological function (or disease association or differential expression) of the 



3 See Iyer Declaration, footnote at! 7 for a slightly more "nuanced" view. 
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claimed polynucleotide or the encoded polypeptide is superfluous information for the purposes 
of establishing utility. 

The uncontested fact that the claimed invention already has a disclosed use as a tool in 
then available technology (such as cDNA microarrays) distinguishes it from those few claimed 
inventions found not to have utility. In each of those cases, unlike this one, the person of 
ordinary skill in the art was left to guess whether the claimed invention could be used to produce 
an identifiable benefit. Thus the Examiner's unsupported statement that one of those cases, 
Brenner v. Manson, 383 U.S. 519, 148 USPQ 689 (1966), is somehow analogous to this case is 
plainly incorrect. (Final Office Action, page 3.) 

Brenner concerns a narrow exception to the general rule that inventions are useful. It 
holds that where the assertion of utility for the claimed invention is made by association with a 
group including useful members, the group may not include so many useless members that there 
would be less than a substantial likelihood that the claimed invention is in fact one of the useful 
members of the group. In Brenner, the claimed invention was a process for making a synthetic 
steroid. Some steroids are useful, but most are not. While the claimed process in Brenner 
produced a composition that bore homology to some useful steroids, antitumor agents, it also 
bore structural homology to a substantial number of steroids having no utility at all. There was 
no evidence that could show, by substantial likelihood, that the claimed invention would produce 
the benefits of the small subset of useful steroids. It was entirely possible, and indeed likely, that 
the claimed invention was just as useless as the majority of steroids. 

In Brenner, the steroid was not disclosed in the application for a patent to be useful in its 
then-present form. Here, in contrast, the claimed SEQ ED NO:74 polynucleotide is an expressed 
polynucleotide that was disclosed to be useful in the Lai '870 application for many known 
applications involving gene expression monitoring analysis. Its utility is not a matter of 
guesswork. It is not a random DNA or protein sequence that might or might not be useful as a 
scientific tool. Unlike the steroid in Brenner, the utility of the invention claimed here is not 
grounded upon being structurally analogous to a molecule which belongs to a class of molecules 
containing a significant number of useless compositions. 

And, the utilities disclosed in the application are for purposes other than just studying the 

claimed invention itself, Brenner, 383 U.S. at 535, i.e., for other (non self-referential) uses such 

as to ascertain the toxic potential of a drug candidate and to study the efficacy of a proposed 

drug. Indeed, in view of the First Bedilion Declaration (at, e.g., f 15), the evidence shows that 

persons skilled in the art on June 6, 1997, who read the Lai '870 application, would have 
Doc No. 11 87 17 39 09/745,506 



Docket No.: PF-0300-3 CON 

believed the claimed polynucleotide to be so useful that they would request it to be included as a 
probe in cDNA microarrays for conducting gene expression analyses in association with 
identifying drugs for treating immune responses and cancers. 

Accordingly, in this case, biological function (or disease association or differential 
expression) is in fact superfluous information for the purposes of demonstrating utility. Here, the 
claimed invention is more than "substantially likely" to be useful, in a way that is utterly 
independent of knowledge of precise biological function, as the First Bedilion Declaration, the 
Rockett Declaration, the Iyer Declaration, and other evidence presented by the Appellants 
demonstrate. Given that the claimed invention has disclosed and well-established utilities, the 
Appellants need not demonstrate utility by imputation, or by showing disease association or 
differential expression. 

In the end, the Examiner has failed to recognize that new technologies, such as those 
involving the use of cDNA microarrays to conduct gene expression analyses, have made useful 
biological molecules that might not otherwise have been useful in the past. See Brenner, 383 
U.S. at 536. Technology has now advanced well beyond the point that a person of ordinary skill 
in the art would have to guess whether a newly discovered expressed polynucleotide or protein 
could be usefully employed without further research. It has created a need for new tools, such as 
the claimed polynucleotide, that provide, and have been providing for some time now, 
unquestioned commercial and scientific benefits, and real-world benefits to the public by 
enabling faster, cheaper and safer drug discovery processes. The Examiner is obliged, by law, to 
recognize this reality. 

IV. By requiring the patent applicant to assert a particular or unique utility, the Patent 
Examination Utility Guidelines and Training Materials applied by the Patent 
Examiner misstate the law 

There is an additional, independent reason to overturn the rejections: to the extent the 
rejections are based on Revised Interim Utility Examination Guidelines (64 FR 71427, 
December 21, 1999), the final Utility Examination Guidelines (66 FR 1092, January 5, 2001) 
and/or the Revised Interim Utility Guidelines Training Materials (USPTO Website 
www.uspto.gov, March 1, 2000), the Guidelines and Training Materials are themselves 
inconsistent with the law. 

The Training Materials, which direct the Examiners regarding how to apply the Utility 
Guidelines, address the issue of specificity with reference to two kinds of asserted utilities: 
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"specific" utilities which meet the statutory requirements, and "general" utilities which do not. 
The Training Materials define a "specific utility" as follows: 

A [specific utility] is specific to the subject matter claimed. This contrasts to general 
utility that would be applicable to the broad class of invention. For example, a claim to a 
polynucleotide whose use is disclosed simply as "gene probe" or "chromosome marker" would 
not be considered to be specific in the absence of a disclosure of a specific DNA target. 
Similarly, a general statement of diagnostic utility, such as diagnosing an unspecified disease, 
would ordinarily be insufficient absent a disclosure of what condition can be diagnosed. 

The Training Materials distinguish between "specific" and "general" utilities by assessing 
whether the asserted utility is sufficiently "particular," i.e., unique (Training Materials at page 
52) as compared to the "broad class of invention." (In this regard, the Training Materials appear 
to parallel the view set forth in Stephen G. Kunin, Written Description Guidelines and Utility 
Guidelines, 82 J.P.T.O.S. 77, 97 (Feb. 2000) ("With regard to the issue of specific utility the 
question to ask is whether or not a utility set forth in the specification is particular to the claimed 
invention.")). 

Such "unique" or "particular" utilities never have been required by the law. To meet the 
utility requirement, the invention need only be "practically useful," Natta, 480 F.2d 1 at 1397, 
and confer a "specific benefit" on the public. Brenner, 383 U.S. at 534. Thus, incredible 
"throwaway" utilities, such as trying to "patent a transgenic mouse by saying it makes great 
snake food," do not meet this standard. Karen Hall, Genomic Warfare, The American Lawyer 
68 (June 2000) (quoting John Doll, Chief of the Biotech Section of USPTO). 

This does not preclude, however, a general utility, contrary to the statement in the 
Training Materials where "specific utility" is defined (page 5). Practical real-world uses are not 
limited to uses that are unique to an invention. The law requires that the practical utility be 
"definite," not particular. Montedison, 664 F.2d at 375. Appellants are not aware of any court 
that has rejected an assertion of utility on the grounds that it is not "particular" or "unique" to the 
specific invention. Where courts have found utility to be too "general," it has been in those cases 
in which the asserted utility in the patent disclosure was not a practical use that conferred a 
specific benefit. That is, a person of ordinary skill in the art would have been left to guess as to 
how to benefit at all from the invention. In Kirk, for example, the CCPA held the assertion that a 
man-made steroid had "useful biological activity" was insufficient where there was no 
information in the specification as to how that biological activity could be practically used. Kirk, 
376 F.2d at 941. 
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The fact that an invention can have a particular use does not provide a basis for requiring 
a particular use. See Brana, supra (disclosure describing a claimed antitumor compound as 
being homologous to an antitumor compound having activity against a "particular" type of 
cancer was determined to satisfy the specificity requirement). "Particularity" is not and never 
has been the sine qua non of utility; it is, at most, one of many factors to be considered. 

As described supra, broad classes of inventions can satisfy the utility requirement so long 
as a person of ordinary skill in the art would understand how to achieve a practical benefit from 
knowledge of the class. Only classes that encompass a significant portion of nonuseful members 
would fail to meet the utility requirement. Supra § IILB. (Montedison, 664 F.2d at 374-75). 

The Training Materials fail to distinguish between broad classes that convey information 
of practical utility and those that do not, lumping all of them into the latter, unpatentable 
category of "general" utilities. As a result, the Training Materials paint with too broad a brush. 
Rigorously applied, they would render unpatentable whole categories of inventions that 
heretofore have been considered to be patentable and that have indisputably benefited the public, 
including the claimed invention. See supra § IILB. Thus the Training Materials cannot be 
applied consistently with the law. 

Issue 2: Enablement Rejection of Claims 25-33, 39, 41, 43, 44, and 45 

The rejection set forth in the Final Office Action is based on the assertions discussed 
above, i.e., that the claimed invention lacks patentable utility. To the extent that the rejection 
under 35 U.S.C. § 1 12, first paragraph, is based on the improper allegation of lack of patentable 
utility under 35 U.S.C. § 101, it fails for the same reasons. 

Issue 3: Enablement Rejection of Claims 25, 28-30, 32, 33, 39, 41, and 43-45 with 
respect to fragments, variants, arrays, complementary sequences, and RNA equivalents 

The Examiner further contended that the claimed polynucleotides encoding variants of 
SEQ ID NO:37, polynucleotides encoding fragments of SEQ ID NO:37, polynucleotide variants 
of SEQ ID NO:74, fragments of SEQ ID NO:74, fragments of polynucleotide variants of SEQ 
NO:74, complementary polynucleotide sequences and RNA equivalents to the above, and arrays 
comprising the above are not enabled. The Examiner states that "[t]he specification does not 
enable any person skilled in the art to which it pertains or with which it is most nearly connected, 
to make/use the invention commensurate in scope with these claims." (Final Office Action, page 
22.) 
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The claimed polynucleotides are enabled, i.e., they are supported by the Specification and 
what is well known in the art. 

L How to make 

SEQ ID NO:37 and SEQ ID NO:74 are specifically disclosed in the application (see, for 
example, pages 95-96 and pages 113-114 of the Sequence Listing). Variants of SEQ ID NO:37 
and SEQ ID NO:74 are disclosed, for example, on page 33, lines 1-18. Incyte clones in which 
the nucleic acids encoding the human NHRP-37 were first identified and libraries from which 
those clones were isolated are disclosed, for example, on page 32, lines 18-23. Chemical and 
structural features of NHRP-37 are disclosed, for example, on page 32, lines 24-30. 

The Examiner alleged that "even a single amino acid substitution or what appears to be a 
minor modification will often dramatically affect the biological activity of a protein," and "it 
could not be predicted that a variant polynucleotide, or polynucleotide encoding a variant protein 
would have equivalent functional characteristic of the polynucleotide which encodes SEQ ID 
NO:37." (Final Office Action, page 23.) However, Appellants submit that the polypeptide 
variant sequences and polynucleotide variant sequences are described by their being "naturally 
occurring" and by their percentage sequence identity with SEQ ID NO:37 and SEQ ID NO:74 
and not by biological activity. The choice of amino acids or nucleotides to alter is made by 
nature. "Naturally occurring" polypeptide variant sequences and polynucleotide variant 
sequences occur in nature; they are not created exclusively in a laboratory. The Specification 
teaches how to find polynucleotide variants (e.g., page 55, lines 19-23) which can then be 
expressed to make polypeptide variants and how to use BLAST to determine whether a given 
naturally occurring polynucleotide sequence falls within the "at least 95% identical to the 
polynucleotide sequence of SEQ ID NO:74" scope and whether a given naturally occurring 
amino acid sequence falls within the "at least 95% identical to the amino acid sequence of SEQ 
ED NO:37" scope (e.g., page 63, line 10 through page 64, line 5). In addition, determination of 
percentage identity is well known in the art. 

The making of the claimed polynucleotides and RNA equivalents by recombinant and 
chemical synthetic methods is disclosed in the Specification, at, e.g., page 33, line 29 through 
page 34, line 3, page 36, line 30 through page 37, line 2, and page 37, lines 16-26. The making 
of the claimed arrays is disclosed in the Specification at, e.g., page 58, line 14 through page 59, 
line 13, and page 67, line 22 through page 68, line 10. The making of the claimed 



Doc No.118717 



43 



09/745,506 



Docket No.: PF-0300-3 CON 

polynucleotides comprising complementary sequences is disclosed in the Specification at, e.g., 
page 48, lines 26-29, and page 68, lines 16-25. 

Appellants submit that the specification fully enables the making of the claimed 
polynucleotides encoding immunogenic fragments of SEQ ID NO:37. The polypeptide sequence 
of SEQ ID NO:37 is provided in the Sequence Listing. Preparation of immunogenic fragments is 
described in the Specification, e.g., at page 47, lines 4-10 and page 69, line 22 through page 70, 
line 2. 

The ability of a given fragment to induce a specific immune response in animals or cells, 
to bind with specific antibodies, or to elicit production of antibodies that bind to the full-length 
NHRP-37 (see Specification at, e.g., page 12, lines 6-8, page 46, line 22 through page 48, line 
14, and page 69, line 21 through page 70, line 6) are tests for whether the fragment is 
"immunogenic." The tests of fragments by these methods do not require undue experimentation; 
the Specification provides a test for antibody binding e.g., at page 61, lines 13-16. The making 
of antibodies is disclosed in the Specification at, e.g., page 46, line 22 through page 48, line 14 
and page 69, line 21 through page 70, line 6. 

This satisfies the "how to make" requirement of 35 U.S.C. § 112, first paragraph. 

II. How to Use 

The claimed polynucleotide variants, fragments, RNA equivalents, and complementary 
sequences are products of expressed genes. The claimed arrays comprise products of expressed 
genes. Therefore, these polynucleotides and arrays are useful for the same purposes as the 
polynucleotides comprising the polynucleotide sequence of SEQ ID NO:74 and the 
polynucleotide encoding the polypeptide sequence of SEQ ID NO: 37. These utilities are 
described fully under the rejection under §101 (Issue 1, supra) of this Brief and in the First 
Bedilion Declaration, Rockett Declaration, Iyer Declaration, and Second Bedilion Declaration. 
In addition, the Specification discloses the use of complementary polynucleotides in antisense 
technology e.g., on page 11, line 25 through page 12, line 3, page 48, lines 16-23, page 49, lines 
7-18, page 58, lines 18-26, and page 68, lines 16-25. In addition the Specification discloses the 
use of arrays e.g., on page 58, line 8 through page 59, line 28 and page 67, line 21 through page 
68, line 14. This satisfies the "how to use" requirement of 35 U.S.C. § 112, first paragraph. 

The Examiner cited Burgess et al., Lazar et al., Mathews and Van Holde, Matthews, and 

Bork in support of the argument that the claimed variant polynucleotides and recited variant 

polypeptides may have different biological functions than SEQ ID NO:74 and SEQ ED NO:37. 
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However, these documents do not support the enablement rejection as the Specification, along 

with what is well known to one of skill in the art, enable the use of the claimed variant 

polynucleotides and the claimed polynucleotides encoding variant polypeptides in toxicology 

testing by virtue of their being expressed polynucleotides, or encoding expressed polypeptides, 

regardless of their biological function. The Examiner has confused use with biological function. 

The Examiner further contends that "the specification does not teach how to use a probe 

comprising a sense strand of SEQ ID NO:74 or a sense strand of a naturally occurring variant of 

SEQ ID NO:74." (Final Office Action, page 23.) One of skill would be able to use 

embodiments of Claim 32 in an array. Furthermore, the Specification teaches the use of these 

polynucleotides in PCR reactions in methods of measuring the expression of the claimed 

polynucleotides, e.g.,. 

Additional diagnostic uses for oligonucleotides designed from the 
sequences encoding NHRP may involve the use of PCR. Such oligomers may be 
chemically synthesized, generated enzymatically, or produced in vitro. 
Oligomers will preferably consist of two nucleotide sequences, one with sense 
orientation (5'->3 5 ) and another with antisense (3'<-5'), employed under 
optimized conditions for identification of a specific gene or condition. The same 
two oligomers, nested sets of oligomers, or even a degenerate pool of oligomers 
may be employed under less stringent conditions for detection and/or quantitation 
of closely related DNA or RNA sequences. (Specification, page 57, lines 23-30, 
emphasis added.) 

The Examiner further alleges that "Applicant has not taught how o [sic] use an antibody 
that binds to the polypeptide encoded by the claimed polynucleotide because there is not 
biological function, significance, or correlation to a disease state associated with the disclosed 
polynucleotide" and that "[o]ne of skill would not know how to use antibodies which bind to 
SEQ ID NO:37 or antibodies which bind to variant of SEQ ID NO:37 having at least 95% 
sequence similarity to SEQ ID NO:37 for the same reason." (Final Office Action, page 27.) 

Antibodies which bind to polypeptides encoded by the claimed polynucleotides can be 

used, e.g, in toxicology testing to measure the expression of polypeptides encoded by the 

claimed polynucleotides. For example, Rockett discusses antibody microarrays in his 

Declaration stating that: 

Although the protein expression profiles produced by 2D-PAGE analysis 
are analogous to the transcript expression profiles provided by nucleic acid 
microarrays, an even closer analogy is perhaps offered by antibody microarrays; 
as I note in my Drug Discovery Today commentary, such antibody microarrays 
date back to the work of Roger Ekins in the mid- to late-1980s. (Rockett 
Declaration, ^[13.) 
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Rockett cites the following publications with respect to antibody microarrays. ( Ekins et 
al.,7. Bioluminescence Chemiluminescence 5:59-78 (1989); Ekins et al, Clin. Chem. 37: 1955- 
1965 (1991); and Ekins, U.S. Patent Nos. 5,432,099, 5,807,755, and 5,837,551.) (Rockett 
Declaration, 113 and Rockett Exhibits M to Q). 

Rockett further states with respect to antibody microarrays that 

... as with nucleic acid microarrays, the greater the number of proteins 
detectable, the greater the power of the technique; the absence or failure of a 
protein to change in expression levels does not diminish the usefulness of the 
method; and prior knowledge of the biological function of the protein is not 
required. As applied to protein expression profiling, these principles have been 
well understood since at least as early as the 1980s. (Rockett Declaration, fl4.) 

Issue 4: Written Description Rejection of Claims 25, 28, 29, 30, 32, 33, 39, 41, 44, and 
45 with respect to allegedly new matter 

The Examiner rejected Claims 25, 28, 29, 30, 32, 33, 39, 41, 44, and 45 under 35 U.S.C. 
§112, first paragraph, stating that the claims were not adequately described because they 
allegedly contain "new matter." 

I. With respect to "naturally occurring" and "at least 95% identical to" 

The Examiner alleged that a naturally occurring amino sequence at least 95% identical to 
the amino acid sequence of SEQ ID NO:37 and a polynucleotide comprising a naturally 
occurring polynucleotide sequence at least 95% identical to the polynucleotide of SEQ ID 
NO:74" were not supported in the original disclosure. The Examiner noted that "[t]he 
specification contemplates allelic sequences on page 10, lines 1-7, and NHRP variants having 
90% sequence identity [to] the NHRP sequence, however, this is not adequate basis for naturally 
occurring amino acid sequences having at least 90% identity to SEQ ID NO:37 or naturally 
occurring polynucleotide sequences having 90% sequence identity to SEQ ID NO:74, or a 
variant which is at least 95% identical to SEQ ID NO:37 encoded by SEQ ID NO:74, or a 
polynucleotide comprising an allelic sequence having at least 95% identity to SEQ ED NO:74.)" 
(Final Office Action, page 29.) Appellants note that the claims were amended to recite "at least 
95% identical" with the Response filed January 27, 2003; the claims no longer recite "at least 
90% identical." 

Naturally occurring polypeptide sequences are supported in the Specification, e.g., at 
page 9, lines 23-26: 
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NHRP, as used herein, refers to the amino acid sequences of substantially 
purified NHRP obtained from any species, particularly mammalian, including 
bovine, ovine, porcine, murine, equine, and preferably human, from any source 
whether natural, synthetic, semi-synthetic, or recombinant. 

Polypeptides comprising a sequence at least 95% identical to the amino acid sequence of 

SEQ ID NO:37 are supported in the Specification, e.g., at page 33, lines 3-5: 

A most preferred NHRP variant is one having at least 95% amino acid 
sequence identity to an NHRP disclosed herein (SEQ ID NOs:l-37). 

Case law provides that to fulfill the written description requirement of 35 U.S.C. §112, 
first paragraph, . . the applicant must also convey with reasonable clarity to those skilled in 
the art that, as of the filing date sought, he or she was in possession of the invention. The 
invention is, for purposes of the 'written description' inquiry, whatever is now claimed." Vas- 
Cath, Inc. v. Mahurkar, 19 USPQ2d 1111, 1117 (Fed. Cir. 1991). Consideration of the 
originally filed application shows that Appellants were in possession of what is now claimed, 
i.e., "a naturally occurring polynucleotide sequence at least 95% identical to the polynucleotide 
sequence of SEQ ID NO:74." 

In this regard, see the following portions of the Specification as well as those cited above: 

It will be appreciated by those skilled in the art that as a result of the 
degeneracy of the genetic code, a multitude of nucleotide sequences encoding 
NHRP, some bearing minimal homology to the nucleotide sequences of any 
known and naturally occurring gene, may be produced. Thus, the invention 
contemplates each and every possible variation of nucleotide sequence that could 
be made by selecting combinations based on possible codon choices. These 
combinations are made in accordance with the standard triplet genetic code as 
applied to the nucleotide sequence of naturally occurring NHRP, and all such 
variations are to be considered as being specifically disclosed. (Specification, 
page 33, lines 11-18.) 

Before the present proteins, nucleotide sequences, and methods are 
described, it is understood that this invention is not limited to the particular 
methodology, protocols, cell lines, vectors, and reagents described, as these may 
vary. It is also to be understood that the terminology used herein is for the 
purpose of describing particular embodiments only, and is not intended to limit 
the scope of the present invention which will be limited only by the appended 
claims. (Specification, page 9, lines 1-6.) 



Thus, while the originally filed application does not contain a verbatim recitation of the 

present "at least 95% identical to the polynucleotide sequence. . claim language, it is apparent 
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that the inventors contemplated naturally occurring polynucleotide sequences of NHRP 
molecules at least 95% identical to the polynucleotide sequence of SEQ ID NO:74 by virtue of 
contemplating naturally occurring polypeptide sequences of NHRP molecules at least 95% 
identical to the polypeptide sequence of SEQ ID NO:37. 

Accordingly, the "at least 95% identical to the polynucleotide sequence . . language 
appearing in Claim 32 does not represent new matter. 

The Examiner further alleges that "[t]he specification or originally filed claims did not 

contemplate arrays comprising oligonucleotides complementary to polynucleotide having 95% 

identity to SEQ ID NO:74." (Final Office Action, page 29.) Appellants submit that such arrays 

are contemplated in the Specification. See the discussion supra, the Specification at, e.g., page 

12, lines 9-17, and page 68, lines 16-25, as well as below. 

In further embodiments, oligonucleotides derived from any of the 
polynucleotide sequences described herein may be used in microarrays. 

(Specification, page 58, lines 8-9, emphasis added.) 

In another embodiment of the invention, the polynucleotides encoding 
NHRP may be used for diagnostic purposes. The polynucleotides which may be 
used include oligonucleotide sequences, complementary RNA and DNA 
molecules, and PNAs. The polynucleotides may be used to detect and quantitate 
gene expression in biopsied tissues in which expression of NHRP may be 
correlated with disease. The diagnostic assay may be used to distinguish between 
absence, presence, and excess expression of NHRP, and to monitor regulation of 
NHRP levels during therapeutic intervention. 

In one aspect, hybridization with PCR probes which are capable of 
detecting polynucleotide sequences, including genomic sequences, encoding 
NHRP or closely related molecules, may be used to identify nucleic acid 
sequences which encode NHRP. The specificity of the probe, whether it is made 
from a highly specific region, e.g., 10 unique nucleotides in the 5' regulatory 
region, or a less specific region, e.g., especially in the 3' coding region, and the 
stringency of the hybridization or amplification (maximal, high, intermediate, or 
low) will determine whether the probe identifies only naturally occurring 
sequences encoding NHRP, alleles, or related sequences. 

Probes may also be used for the detection of related sequences, and 
should preferably contain at least 50% of the nucleotides from any of the 
NHRP encoding sequences. The hybridization probes of the subject invention 
may be DNA or RNA and derived from the nucleotide sequence of SEQ ID 
NOs:38-74 or from genomic sequence including promoter, enhancer elements, 
and introns of the naturally occurring NHRP. (Specification, page 55, lines 4-23, 
emphasis added.) 
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II. Claim 33, with respect to "60 contiguous nucleotides of a polynucleotide of claim 32" 

The Examiner rejected Claim 33 on the basis of new matter, stating that "claim 33 
persists in incorporating the limitation of 60 consecutive nucleotides of claim 32, although page 
17, lines 14-17 [of the Office Action mailed September 25, 2002] state that the new limitation of 
'60 consecutive nucleotides' was not contemplated in the specification or claims as originally 
filed." (Final Office Action, page 29.) 

The polynucleotides of Claim 33 are supported in the Specification as filed. For 
example, at page 15, lines 9-10: "'Fragments' are those nucleic acid sequences which are 
greater than 60 nucleotides than [sic] in length." At page 7, lines 3-4: "In another aspect the 
invention provides compositions comprising isolated and purified polynucleotide sequences of 
SEQ ID NOs:38-74 or fragments thereof." At page 15, lines 12-13: "The term 'oligonucleotide' 
refers to a nucleic acid sequence of at least about 6 nucleotides to about 60 nucleotides." 

Issue 5: Written Description Rejection of Claims 25, 28, 29, 30, 32, 33, 39, 41, 44, and 
45 with respect to polynucleotides comprising a naturally-occurring polynucleotide 
sequence at least 95% identical to the polynucleotide sequence of SEQ ID NO: 74 or the 
claimed polynucleotide encoding a polypeptide comprising a naturally-occurring amino 
acid sequence least 95% identical to the amino acid sequence of SEQ ID NO:37 

Claims 25, 28, 29, 30, 32, 33, 39, 41, 44, and 45 have been further rejected under the first 
paragraph of 35 U.S.C. §112 for alleged lack of an adequate written description. The Examiner 
alleged that "the written description is not commensurate in scope with the claims drawn to 
polynucleotides encoding naturally occurring amino acids [sic] sequences having 95% sequence 
identity to SEQ ID NO:37 or polynucleotides comprising a naturally occurring polynucleotide 
sequences [sic] at least 95% identical to SEQ ID NO:74." (Final Office Action, page 31.) The 
Examiner further alleged that "neither the common attributes of the genus nor specific examples 
of species representative of the genus have been described" and [w]ith the exception of SEQ ID 
NO:74, and the polynucleotides encoding SEQ ID NO:37, the skilled artisan cannot envision the 
detailed structure of the encompassed polynucleotides and therefore conception is not achieved 
until reduction to practice has occurred, regardless of the complexity or simplicity of the method 
of isolation." (Final Office Action, page 32.) 

The requirements necessary to fulfill the written description requirement of 35 U.S.C. 
§112, first paragraph, are well established by case law. 
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... the applicant must also convey with reasonable clarity to those skilled 
in the art that, as of the filing date sought, he or she was in possession of the 
invention. The invention is, for purposes of the "written description" inquiry, 
whatever is now claimed. Vas-Cath, Inc. v. Mahurkar, 19 USPQ2d 1111, 1117 
(Fed. Cir. 1991) 

Attention is also drawn to the Patent and Trademark Office's own "Guidelines for 

Examination of Patent Applications Under the 35 U.S.C. Sec. 112, para. 1", published January 5, 

2001, which provide that : 

An applicant may also show that an invention is complete by disclosure of 
sufficiently detailed, relevant identifying characteristics which provide evidence 
that applicant was in possession of the claimed invention, i.e., complete or partial 
structure, other physical and/or chemical properties, functional characteristics 
when coupled with a known or disclosed correlation between function and 
structure, or some combination of such characteristics. What is conventional or 
well known to one of ordinary skill in the art need not be disclosed in detail. 
If a skilled artisan would have understood the inventor to be in possession of 
the claimed invention at the time of filing, even if every nuance of the claims 
is not explicitly described in the specification, then the adequate description 
requirement is met. (citations omitted, emphasis added.) 

Thus, the written description standard is fulfilled by both what is specifically disclosed 
and what is conventional or well known to one skilled in the art. 

SEQ ID NO:37 and SEQ ID NO:74 are specifically disclosed in the application (see, for 
example, pages 95-96 and 1 13-1 14 of the Sequence Listing). Variants of SEQ ID NO:37 are 
described, for example, at page 17, lines 8-16. In particular, the preferred, more preferred, and 
most preferred SEQ ID NO:37 variants (80%, 90%, and 95% amino acid sequence identity to 
SEQ ID NO:37) are described, for example, at page 33, lines 1-5. Incyte clones in which the 
nucleic acids encoding the human NHRP-37 were first identified and libraries from which those 
clones were isolated are described, for example, at page 32, lines 18-23 of the Specification. 
Chemical and structural features of NHRP-37 are described, for example, on page 32, lines 24- 
30. Given SEQ ID NO:37, one of ordinary skill in the art would recognize naturally-occurring 
variants of SEQ ID NO:37 at least 95% identical to SEQ ID NO:37. Given SEQ ID NO:74, one 
of ordinary skill in the art would recognize naturally-occurring variants of SEQ ID NO:74 at 
least 95% identical to SEQ ID NO:74. The Specification describes (e.g., page 63, line 10 
through page 64, line 5) how to use BLAST to determine whether a given sequence falls within 
the "at least 95% identical" scope. Immunogenic fragments are described in the Specification, 
e.g., at page 12, lines 6-8. 
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There simply is no requirement that the claims recite particular variant and fragment 
polypeptide or polynucleotide sequences because the claims already provide sufficient structural 
definition of the claimed subject matter. That is, the polypeptide variants and fragments are 
defined in terms of SEQ ID NO:37 ("An isolated polynucleotide encoding a polypeptide selected 
from the group consisting of. . . b) a polypeptide comprising a naturally occurring amino acid 
sequence at least 95% identical to the amino acid sequence of SEQ ID NO:37, and c) an 
immunogenic fragment of a polypeptide having the amino acid sequence of SEQ ID NO:37." 
The polynucleotide variants and fragments are defined in terms of SEQ ID NO:74 ("An isolated 
polynucleotide selected from the group consisting of. . . : b) a polynucleotide comprising a 
naturally occurring polynucleotide sequence at least 95% identical to the polynucleotide 
sequence of SEQ ED NO:74;" "An isolated polynucleotide comprising at least 60 contiguous 
nucleotides of a polynucleotide of claim 32.") 

Because the recited polypeptide variants and fragments are defined in terms of SEQ LD 
NO:37, and the recited polynucleotide variants and fragments are defined in terms of SEQ ID 
NO:37 and SEQ ID NO:74, the precise chemical structure of every polypeptide variant and 
fragment and every polynucleotide variant and fragment within the scope of the claims can be 
discerned. The Examiner's position is nothing more than a misguided attempt to require 
Appellants to unduly limit the scope of their claimed invention. Appellants further submit that 
given the polypeptide sequence of SEQ ID NO:37 and the polynucleotide sequence of SEQ ED 
NO:74, it would be redundant to list specific fragments. The structures of SEQ ID NO:37 and 
SEQ ID NO:74 provide the blueprint for all fragments thereof. Listing all possible fragments of 
SEQ ID NO: 37 and SEQ ID NO:74 is, thus, a superfluous exercise which would needlessly 
clutter the Specification. Accordingly, the Specification provides an adequate written 
description of the recited polypeptide and polynucleotide sequences. 

I. The present claims specifically define the claimed genus through the recitation of 
chemical structure 

Court cases in which "DNA claims" have been at issue commonly emphasize that the 
recitation of structural features or chemical or physical properties are important factors to 
consider in a written description analysis of such claims. For example, in Fiers v. Revel, 25 
USPQ2d 1601, 1606 (Fed. Cir. 1993), the court stated that: 
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If a conception of a DNA requires a precise definition, such as by 
structure, formula, chemical name or physical properties, as we have held, then a 
description also requires that degree of specificity. 

In a number of instances in which claims to DNA have been found invalid, the courts 

have noted that the claims attempted to define the claimed DNA in terms of functional 

characteristics without any reference to structural features. As set forth by the court in 

University of California v. Eli Lilly and Co., 43 USPQ2d 1398, 1406 (Fed. Cir. 1997): 

In claims to genetic material, however, a generic statement such as 
"vertebrate insulin cDNA" or "mammalian insulin cDNA," without more, is not 
an adequate written description of the genus because it does not distinguish the 
claimed genus from others, except by function. 

Thus, the mere recitation of functional characteristics of a DNA, without the definition of 
structural features, has been a common basis by which courts have found invalid claims to DNA. 
For example, in Lilly, 43 USPQ2d at 1407, the court found invalid for violation of the written 
description requirement the following claim of U.S. Patent No. 4,652,525: 

1. A recombinant plasmid replicable in procaryotic host containing within 
its nucleotide sequence a subsequence having the structure of the reverse 
transcript of an mRNA of a vertebrate, which mRNA encodes insulin. 

In Fiers, 25 USPQ2d at 1603, the parties were in an interference involving the following 

count: 

A DNA which consists essentially of a DNA which codes for a human 
fibroblast interferon-beta polypeptide. 

Party Revel in the Fiers case argued that its foreign priority application contained an 
adequate written description of the DNA of the count because that application mentioned a 
potential method for isolating the DNA. The Revel priority application, however, did not have a 
description of any particular DNA structure corresponding to the DNA of the count. The court 
therefore found that the Revel priority application lacked an adequate written description of the 
subject matter of the count. 

Thus, in Lilly and Fiers, nucleic acids were defined on the basis of functional 
characteristics and were found not to comply with the written description requirement of 35 
U.S.C. §112; i.e., "an mRNA of a vertebrate, which mRNA encodes insulin" in Lilly, and "DNA 
which codes for a human fibroblast interferon-beta polypeptide" in Fiers. In contrast to the 
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situation in Lilly and Fiers, the claims at issue in the present application define polynucleotides 

and polypeptides in terms of chemical structure, rather than on functional characteristics. For 

example, the "variant language" of independent Claims 25 and 32 recites chemical structure to 

define the claimed genus: 

25. An isolated polynucleotide encoding a polypeptide selected from the 
group consisting of:. . . 

b) a polypeptide comprising a naturally occurring amino acid 
sequence at least 95% identical to the amino acid sequence of SEQ ED NO:37. . . 

32. An isolated polynucleotide selected from the group consisting of . . . : 

b) a polynucleotide comprising a naturally occurring polynucleotide 
sequence at least 95% identical to the polynucleotide sequence of SEQ ID NO:74. 

From the above it should be apparent that the claims of the subject application are 
fundamentally different from those found invalid in Lilly and Fiers. The subject matter of the 
present claims is defined in terms of the chemical structure of SEQ ID NO:37 and SEQ ID 
NO:74. In the present case, there is no reliance merely on a description of functional 
characteristics of the polynucleotides and polypeptides recited by the claims. Such functional 
recitations that are included add to the structural characterization of the recited polypeptides and 
polynucleotides. The polynucleotides and polypeptides defined in the claims of the present 
application recite structural features, and cases such as Lilly and Fiers stress that the recitation of 
structure is an important factor to consider in a written description analysis of claims of this type. 
By failing to base its written description inquiry "on whatever is now claimed," the Final Office 
Action failed to provide an appropriate analysis of the present claims and how they differ from 
those found not to satisfy the written description requirement in Lilly and Fiers. 

II. The present claims do not define a genus which is "highly variant" 

Furthermore, the claims at issue do not describe a genus which could be characterized as 
"highly variant." (Final Office Action, page 35.) Available evidence illustrates that the claimed 
genus is of narrow scope. 

In support of this assertion, the Board's attention is directed to the enclosed reference by 

Brenner et al. ("Assessing sequence comparison methods with reliable structurally identified 

distant evolutionary relationships," Proc. Natl. Acad. Sci. USA (1998) 95:6073-6078) (Reference 

No. 17). Through exhaustive analysis of a data set of proteins with known structural and 
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functional relationships and with <90% overall sequence identity, Brenner et al. have determined 
that 30% identity is a reliable threshold for establishing evolutionary homology between two 
sequences aligned over at least 150 residues. (Brenner et al., pages 6073 and 6076.) 
Furthermore, local identity is particularly important in this case for assessing the significance of 
the alignments, as Brenner et al. further report that >40% identity over at least 70 residues is 
reliable in signifying homology between proteins. (Brenner et al., page 6076.) 

The present application is directed, inter alia, to regulatory proteins related to the amino 
acid sequence of SEQ ID NO:37. In accordance with Brenner et al, naturally occurring 
molecules may exist which could be characterized as regulatory proteins and which have as little 
as 40% identity over at least 70 residues to SEQ ID NO:37. The "variant language" of the 
present claims recites, for example, polynucleotides encoding "a polypeptide . . . comprising a 
naturally occurring amino acid sequence at least 95% identical to the amino acid sequence of 
SEQ ED NO:37" (note that SEQ ID NO:37 has 350 amino acid residues). This variation is far 
less than that of all potential regulatory proteins related to SEQ ID NO:37, i.e., those regulatory 
proteins having as little as 40% identity over at least 70 residues to SEQ ID NO:37. 

III. The state of the art at the time of the present invention is further advanced than at 
the time of the Lilly and Fiers applications 

In the Lilly case, claims of U.S. Patent No. 4,652,525 were found invalid for failing to 
comply with the written description requirement of 35 U.S.C. §112. The '525 patent claimed the 
benefit of priority of two applications, Application Serial No. 801,343 filed May 27, 1977, and 
Application Serial No. 805,023 filed June 9, 1977. In the Fiers case, party Revel claimed the 
benefit of priority of an Israeli application filed on November 21, 1979. Thus, the written 
description inquiry in those case was based on the state of the art at essentially at the "dark ages" 
of recombinant DNA technology. 

The present application has a priority date of June 6, 1997. Much has happened in the 

development of recombinant DNA technology in the 17 or more years from the time of filing of 

the applications involved in Lilly and Fiers and the present application. For example, the 

technique of polymerase chain reaction (PCR) was invented. Highly efficient cloning and DNA 

sequencing technology has been developed. Large databases of protein and nucleotide 

sequences have been compiled. Much of the raw material of the human and other genomes has 

been sequenced. With these remarkable advances one of skill in the art would recognize that, 

given the sequence information of SEQ ID NO:37 and SEQ ID NO:74, and the additional 
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extensive detail provided by the subject application, the present inventors were in possession of 
the claimed polynucleotides encoding polypeptide variants and polypeptide fragments, the 
claimed polynucleotide variants, and the claimed polynucleotide fragments at the time of filing 
of this application. 

IV. The Examiner questions the value of Appellants' statements in the Responses filed 
February 20, 2003 and July 23, 2003 that "one of skill in the art would know how to use the 
BLAST program to determine 95% identity." (Final Office Action, page 32 and 36.) Appellants 
note that the claimed polynucleotides are sufficiently described. One of skill in the art would be 
able to describe polynucleotides comprising naturally occurring sequences that fall within the 
claimed limitations of percentage identity with SEQ ID NO:74 and to describe polynucleotides 
encoding polypeptides comprising naturally occurring sequences that fall within the claimed 
limitations of percentage identity with SEQ ID NO:37. 

The Examiner further alleges that "the instant genuses are not limited by functional 
attributes." (Final Office Action, pages 33 and 37.) However, functional limitations are not 
necessary as the structural and source limitations are sufficient to describe the claimed 
polynucleotide variants and claimed polynucleotides encoding polypeptide variants, and, in any 
case, "function" is irrelevant to the use of the claimed polynucleotide variants and claimed 
polynucleotides encoding polypeptide variants in toxicology testing. 

The Examiner states that Appellants' arguments are "not persuasive in light of the written 
description requirements which requires, in the absence of a recitation of a number of 
representative specifies [sic] of the genus, a function correlation with the disclosed single 
member of the genus." (Final Office Action, page 36.) Appellants note that the function is a 
requirement for adequate written description. 

V. The Examiner contends that "reliance on %90 or %95 [sic] sequence identity does not 
guarantee that the variants will have the same functional attributes as SEQ ID NO:37." (Final 
Office Action, pages 33 and 37-38.) As the claimed variants are not described by their having 
the same "function" as SEQ ID NO:37 or SEQ ID NO:74, the Examiner's arguments are not 
relevant to the written description issue. 

Nevertheless, Appellants note that it is well known in the art that sequence similarity is 

predictive of similarity in functional activity. Hegyi and Gerstein (H. Hegyi and M. Gerstein, 

"Annotation Transfer for Genomics: Measuring Functional Divergence in Multi-Domain 
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Proteins," Genome Research (2001) 11: 1632-1640; Reference No. 18) conclude that "the 

probability that two single-domain proteins that have the same superfamily structure have the 

same function (whether enzymatic or not) is about 2/3." (Hegyi and Gerstein, Reference No. 18, 

page 1635.) Hegyi and Gerstein also concluded that, for multi-domain proteins with "almost 

complete coverage with exactly the same type and number of superfamilies, following each other 

in the same order" "[t]he probability that the functions are the same in this case was 91%." 

(Hegyi and Gerstein, Reference No. 18, page 1636.) Hegyi and Gerstein (Reference No. 18, 

page 1632) further note that 

Wilson et al. (2000) compared a large number of protein domains to one 
another in a pair-wise fashion with respect to similarities in sequence, structure, 
and function. Using a hybrid functional classification scheme merging the 
ENZYME and FlyBase systems (Gelbart et al. 1997; Bairoch 2000), they found 
that precise function is not conserved below 30-40% identity, although the broad 
functional class is usually preserved for sequence identities as low as 20-25%, 
given that the sequences have the same fold. Their survey also reinforced the 
previously established general exponential relationship between structural and 
sequence similarity (Chothia and Lesk 1986). 

The polypeptides encoded by the claimed polynucleotides share more than 95% sequence 
identity with the SEQ ID NO:37 polypeptide, well above the thresholds described in the Hegyi 
and Gerstein article (Reference No. 18) cited above. Therefore, there is a reasonable probability 
that the SEQ ID NO:37 polypeptide variants would have the same function as the SEQ ID NO:37 
polypeptide. 

VI. In the Response filed January 27, 2003, Appellants asserted that Brenner teaches that 
"30% identity is a reliable threshold for establishing evolutionary homology between two 
sequences aligned over at least 150 residues" and that ">40% identity over at least 70 residues is 
reliable in signifying homology between proteins." and therefore, that therefore the genus of 
polypeptides at least 95% identical to SEQ ID NO:37 would more likely than not function 
similarly to the polypeptide of SEQ ID NO:37. 

The Examiner, however, dismisses Appellants' arguments, alleging that "Brenner is 
predicting evolutionary relationships within a database of orthologs which are identified 
independently of sequence comparison" and that evolutionary relationships are not predictive of 
functional relationships. (Final Office Action, pages 33 and 37.) 

In the Brenner paper the SCOP database was used as a test set to test the reliability of 
sequence comparison methods. The SCOP database used in the Brenner paper is a database of 
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proteins with known structures. The relationships among the SCOP proteins are already known 
based on non-sequence comparison methods. The structures and functions of the SCOP proteins 
do not need to be ascertained from sequence comparison methods. The Brenner results allow 
one to generalize to the much more common situation of NOT KNOWING the structural and 
functional relationships between two polypeptide sequences and trying to use sequence 
comparison methods to predict those relationships. As the Examiner acknowledges, Brenner 
does not discuss predicting functional similarity, but rather evolutionary relationships. 
(However, the "function" of the claimed polynucleotides or of the polypeptides encoded by the 
claimed polynucleotides is immaterial to the written description, given the description in the 
Specification and what is known to one of skill in the art.) Use of this database of proteins with 
known structures allowed the authors to determine whether homologies predicted from the 
sequence comparison methods tested in the article were truly similar structurally. Brenner is not 
trying to predict relationships between proteins; Brenner is evaluating known methods of 
predicting protein relationships. One cannot test the ability of sequence comparison methods in 
predicting actual structural homology if one starts with protein sequences whose structures were 
not already known previously and independently of the sequence comparison. 

VII. The Examiner asserts that "the instant genus claims are not limited by structural features" 
and that "the instant claims do not recite structural features, they recite only sequence 
homology." The Examiner further asserts that "[t]his is not the same [as] a structural feature 
such as a catalytic site or a binding site." (Final Office Action, pages 32-33.) 

Appellants note that the sequence of a polypeptide is well known in the art to constitute 
"structure." The amino acid sequence of a polypeptide is known as the "primary structure" of a 
polypeptide. For example, Stryer teaches that "[p]rimary structure is simply the sequence of 
amino acids and the location of disulfide bridges, if there are any. The primary structure is thus a 
complete description of the covalent connections of a protein." (L. Stryer, Biochemistry, 2nd 
edition, W.H. Freeman and Company, New York NY, 1981, page 32; Reference No. 19.) Claim 
25 limits the structure of the polypeptides encoded by the claimed polynucleotides to those 
naturally occurring amino acid sequences at least 95% identical to the amino acid sequence of 
SEQ ID NO:37. 

The Examiner further alleges that there is no limitation in the specification as to 

conservation of glycosylation and phosphorylation sites between SEQ ID NO:37 and its variants. 
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Appellants refer the Board to the claims. The Specification adequately describes what is 
claimed. 

The Examiner alleges that "[n]either the specification nor claims identify common 
attributes shared by members of the genus in terms of use or function." (Final Office Action, 
page 33.) Appellants note that the claimed polynucleotides share structural attributes (% identity 
to SEQ ID NO:37 or SEQ ID NO:74). The claimed polynucleotides are naturally-occurring and 
thus share a "common use" in toxicology testing (see e.g., supra, Issues 1 and 3.) 

VIIL Summary 

The Final Office Action failed to base its written description inquiry "on whatever is now 
claimed." Consequently, the Action did not provide an appropriate analysis of the present claims 
and how they differ from those found not to satisfy the written description requirement in cases 
such as Lilly and Fiers. In particular, the claims of the subject application are fundamentally 
different from those found invalid in Lilly and Fiers. The subject matter of the present claims is 
defined in terms of the chemical structure of SEQ ID NO:37 or SEQ ED NO:74. The courts have 
stressed that structural features are important factors to consider in a written description analysis 
of claims to nucleic acids and proteins. In addition, the genus of polynucleotides defined by the 
present claims is adequately described, as evidenced by Brenner et al. Furthermore, there have 
been remarkable advances in the state of the art since the Lilly and Fiers cases, and these 
advances were given no consideration whatsoever in the position set forth by the Final Office 
Action. 

Issue 6: Provisional Double Patenting Rejection of Claims 25, 28, 29, 30, 32, 33, 39, 
41, and 42 

Claims 25, 28, 29, 30, 32, 33, 39, 41, and 42 were provisionally rejected under the 
judicially created doctrine of obviousness-type double patenting over Claims 1, 5, 6, and 7 of 
U.S. Application No. 09/539,800. While not conceding the propriety of the Examiner's position, 
Appellants are willing to submit a Terminal Disclaimer with respect to U.S. Application No. 
09/539,800 in the interest of expediting prosecution of the subject application, upon indication 
that the application is otherwise allowable. Therefore, it is requested that the Board indicate that 
the subject application will be allowable upon submission of such a Terminal Disclaimer. 
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(9) CONCLUSION 

Appellants request that the rejections of the claims on appeal be reversed for at least the 
above reasons. 

Appellants respectfully submit that rejections for lack of utility based, inter alia, on an 
allegation of "lack of specificity," as set forth in the Final Office Action and as justified in the 
Revised Interim and final Utility Guidelines and Training Materials, are not supported in the law. 
Neither are they scientifically correct, nor supported by any evidence or sound scientific 
reasoning. These rejections are alleged to be founded on facts in court cases such as Brenner and 
Kirk, yet those facts are clearly distinguishable from the facts of the instant application, and 
indeed most if not all nucleotide and protein sequence applications. Nevertheless, the PTO is 
attempting to mold the facts and holdings of these prior cases, "like a nose of wax," 4 to target 
rejections of claims to polypeptide and polynucleotide sequences, where biological activity 
information has not been proven by laboratory experimentation, and they have done so by 
ignoring perfectly acceptable utilities fully disclosed in the specifications as well as well- 
established utilities known to those of skill in the art. As is disclosed in the specification, and 
even more clearly, as one of ordinary skill in the art would understand, the claimed invention has 
well-established, specific, substantial and credible utilities. The rejections are, therefore, 
improper and should be reversed. 

Moreover, to the extent the above rejections were based on the Revised Interim and final 
Examination Guidelines and Training Materials, those portions of the Guidelines and Training 
Materials that form the basis for the rejections should be determined to be inconsistent with the 
law. 

Due to the urgency of this matter and its economic and public health implications, an 
expedited review of this appeal is earnestly solicited. 

If the USPTO determines that any additional fees are due, the Commissioner is hereby 
authorized to charge Deposit Account No. 09-0108. 
This brief is enclosed in triplicate. 



4 'The concept of patentable subject matter under §101 is not like a nose of wax which may be turned and twisted 
in any direction * * * ' White v. Dunbar, 119 U.S. 47, 51." (Parker v. Flook, 198 USPQ 193 (US SupCt 1978)) 
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APPENDIX ■ CLAIMS ON APPEAL 

25. (Previously Presented) An isolated polynucleotide encoding a polypeptide selected 
from the group consisting of: 

a) a polypeptide comprising the amino acid sequence of SEQ ID NO:37, 

b) a polypeptide comprising a naturally occurring amino acid sequence at least 95% 
identical to the amino acid sequence of SEQ ID NO:37, and 

c) an immunogenic fragment of a polypeptide having the amino acid sequence of SEQ 
ID NO:37. 

26. (Previously Presented) An isolated polynucleotide encoding a polypeptide 
comprising the amino acid sequence of SEQ ID NO:37. 

27. (Previously Presented) An isolated polynucleotide of claim 26 comprising the 
polynucleotide sequence of SEQ ID NO:74. 

28. (Previously Presented) An isolated recombinant polynucleotide comprising a 
promoter sequence operably linked to a polynucleotide of claim 25. 

29. (Previously Presented) An isolated cell transformed with a recombinant 
polynucleotide of claim 28. 

30. (Previously Presented) A method of producing a polypeptide encoded by a 
polynucleotide of claim 25, the method comprising: 

a) culturing a cell under conditions suitable for expression of the polypeptide, wherein 

said cell is transformed with a recombinant polynucleotide, and said recombinant 
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polynucleotide comprises a promoter sequence operably linked to a polynucleotide of 
claim 25, and 
b) recovering the polypeptide so expressed. 



31. (Previously Presented) A method of claim 30, wherein the polypeptide comprises 
the amino acid sequence of SEQ ID NO:37. 



32. (Previously Presented) An isolated polynucleotide selected from the group 
consisting of: 

a) a polynucleotide comprising the polynucleotide sequence of SEQ ID NO:74 , 

b) a polynucleotide comprising a naturally occurring polynucleotide sequence at least 
95% identical to the polynucleotide sequence of SEQ ID NO:74, 

c) a polynucleotide completely complementary to the polynucleotide of a) over the 
entire length of the polynucleotide of a), and 

d) a polynucleotide completely complementary to the polynucleotide of b) over the 
entire length of the polynucleotide of b). 

33. (Previously Presented) An isolated polynucleotide comprising at least 60 contiguous 
nucleotides of a polynucleotide of claim 32. 



39. (Previously Presented) A microarray wherein at least one element of the microarray 
is a polynucleotide of claim 43. 
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41. (Previously Presented) An array comprising different nucleotide molecules affixed 
in distinct physical locations on a solid substrate, wherein at least one of said nucleotide 
molecules comprises a first oligonucleotide or polynucleotide sequence completely 
complementary to 20 contiguous nucleotides of a target polynucleotide, and wherein said target 
polynucleotide is a polynucleotide of claim 32. 

43. (Previously Presented) An isolated polynucleotide comprising 20 contiguous 
nucleotides of a polynucleotide of claim 32. 

44. (Previously Presented) An isolated polynucleotide of claim 25 encoding a 
polypeptide comprising an amino acid sequence at least 95% identical to the amino acid 
sequence of SEQ ID NO:37 encoded by an allele of SEQ ID NO:74. 1 

45. (Previously Presented) An isolated polynucleotide of claim 32 selected from the 
group consisting of: 

a) a polynucleotide comprising a sequence of an allele of SEQ ID NO:74 at least 
95% identical to the polynucleotide sequence of SEQ ID NO:74, 

b) a polynucleotide completely complementary to the polynucleotide of a) over the 
entire length of the polynucleotide of a). 
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Title: NEW HUMAN REGULATORY PROTEINS 
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Mail Stop Appeal Brief-Patents 
Commissioner for Patents 
P.O. Box 1450 
Alexandria, VA 22313-1450 



BRIEF ON APPEAL 



Sir: 



on 



Further to the Notice of Appeal filed December 8, 2003, and received by the USPTO 
December 10, 2003, herewith are three copies of Appellants' Brief on Appeal. Authorized fees 
include the $ 330.00 fee for the filing of this Brief. 

This is an appeal from the decision of the Examiner finally rejecting Claims 25-33, 39, 
41, 43, 44, and 45 of the above-identified application. 

(1 ) REAL PARTY IN INTFRFSt 

The above-identified application is assigned of record to Incyte Pharmaceuticals, Inc., 
(now Incyte Corporation, formerly known as Incyte Genomics, Inc.) (Reel 8841, Frame 0213) 
which is the real party in interest herein. 
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(2) RELATED APPEALS AND INTERFERENCES 

Appellants, their legal representative and the assignee are not aware of any related 
appeals or interferences which will directly affect or be directly affected by or have a bearing on 
the Board's decision in the instant appeal. 



(3 ) STATUS OF THE CLAIMS 

Claims rejected: Claims 25-33, 39, 4 1 , 43, 44, and 45 

Claims allowed: (none) 

Claims canceled: Claims 1-22 and 42 

Claims withdrawn: Claims 23-24, 34-38, and 40 

Claims on Appeal: Claims 25-33, 39, 41, 43, 44, and 45 (A copy of the claims 
on appeal, as amended, can be found in the attached Appendix.) 

(4) STATUS OF AMENDMENTS AFTER FINAL 

The Amendment after Final Rejection under 37 C.F.R. § 1.116 filed December 8, 2003 
has been entered for purposes of this appeal. In a telephone voicemail message to Appellants' 
representative on January 28, 2004, the Examiner stated that the amendment would be entered 
upon filing of an appeal. 



(5) SUMMARY OF THE INVENTION 

Appellants' invention is directed, inter alia, to an isolated polynucleotide encoding a 
regulatory protein (NHRP), in particular to the elected polynucleotide encoding NHRP-37 (SEQ 
ID NO:74). The claimed polynucleotide has a variety of utilities, in particular in expression 
profiling, and in particular for diagnosis of conditions or diseases characterized by expression of 
NHRP, for toxicology testing, and for drug discovery. (See the Specification at, e.g., page 55, 
line 4 through page 60, line 27.) As described in the Specification (page 32, lines 18-30): 

NHRP-37 (SEQ ID NO:37) was first identified in Incyte Clone 2507014 
from the CONUTUT01 cDNA library using a computer search for amino acid 
sequence alignments. A consensus sequence, SEQ ID NO:74, was derived from 
the extended and overlapping nucleic acid sequences: Incyte Clones 2507014 
(CONUTUT01), 1394758 (THYRNOT03), 1650580 (PROSTUT09), 2152990 
(BRAINOT09), 2361374 (LUNGFET03) and 2602153 (UTRSNOT10). 

In one embodiment, the invention encompasses a polypeptide comprising 
the amino acid sequence of SEQ ID NO:37. NHRP-37 is 350 amino acids in 
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length and has two potential glycosylation sites at N147 and N185, and several 

potential phosphorylation sites at S9, S17, T80, T122, S171, T174, T187 T237 

S293, S313, T315, S329, S340, and T342. NHRP-37 has sequence homology ' 
with S, cerevisiae, GI 1322869, and is associated with cDNA libraries which are 
immortalized or cancerous and show inflammatory or immune responses 

(6 ) ISSUES 

1. Whether Claims 25-33, 39, 41, 43, 44, and 45 directed to polynucleotides meet the 
utility requirement of 35 U.S.C. § 101. 



2. Whether one of ordinary skill in the art would know how to use the polynucleotides of 
Claims 25-33, 39, 41, 43, 44, and 45, e.g., in toxicology testing, drug development, and the 
diagnosis of disease, so as to satisfy the enablement requirement of 35 U.S.C. § 112, first 
paragraph, with respect to the utility rejection. 

3. Whether one of ordinary skill in the art would know how to make and use a 
polynucleotide of Claims 25, 28-30, 32, 33, 39, 41, and 43-45, e.g., in toxicology testing, drug 
development, and the diagnosis of disease, so as to satisfy the enablement requirement of 35 
U.S.C. §112, first paragraph, with respect to polynucleotides encoding variants of SEQ ED 
NO:37, polynucleotides encoding fragments of SEQ ID NO:37, polynucleotide variants of SEQ 
ID NO:74, fragments of SEQ ID NO:74, fragments of polynucleotide variants of SEQ NO:74, 
complementary polynucleotide sequences and RNA equivalents to the above. 

4. Whether the polynucleotides of Claims 25, 28, 29, 30, 32, 33, 39, 41, 44, and 45 meet 
the written description requirement of 35 U.S.C. §112, first paragraph, with respect to allegedly 
new matter. 



5. Whether the claimed polynucleotide comprising a naturally-occurring polynucleotide 
sequence at least 95% identical to the polynucleotide sequence of SEQ ID NO:74 or the claimed 
polynucleotide encoding a polypeptide comprising a naturally-occurring amino acid sequence 
least 95% identical to the amino acid sequence of SEQ ID NO:37 of Claims 25, 28, 29, 30, 32, 
33, 39, 41, 44, and 45 meet the written description requirement of 35 U.S.C. §112, first 
paragraph. 
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6. Whether the polynucleotides of Claims 25, 28, 29, 30, 32, 33, 39, 41, and 42 are 
unpatentable under the judicially created doctrine of obviousness-type double patenting over 
Claims 1, 5, 6, and 7 of co-pending Application Serial No. 09/539,800. 

(7 ) GROUPING OF THE CLAIMS 

As to Issue 1 

This issue pertains to Claims 25-33, 39, 41, 43, 44, and 45. 
As to Issue 2 

This issue pertains to Claims 25-33, 39, 41, 43, 44, and 45. 
As to Issue 3 

This issue pertains to Claims 25, 28-30, 32, 33, 39, 41, and 43-45. 
As to Issue 4 

This issue pertains to Claims 25, 28, 29, 30, 32, 33, 39, 41, 44, and 45. 
As to Issue 5 

This issue pertains to Claims 25, 28, 29, 30, 32, 33, 39, 41, and 44-45. 
As to Issue 6 

This issue pertains to Claims 25, 28, 29, 30, 32, 33, 39, 41, and 42. 

(8) APPELLANTS' ARGUMENTS 
Issue 1: Utility Rejection of Claims 25-33, 39, 41, 43, 44, and 45 

Claims 25-33, 39, 41, 43, 44, and 45 stand rejected under 35 U.S.C. §§ 101 and 1 12, first 
paragraph, based on the allegation that the claimed invention lacks patentable utility. The 
rejection alleges in particular that "the claimed invention is not supported by either a specific, 
substantial, credible asserted utility or a well-established utility." (Final Office Action, page 2.) 

The rejection of Claims 25-33, 39, 41, 43, 44, and 45 is improper, as the inventions of 
those claims have a patentable utility as set forth in the instant specification, and/or a 
utility well known to one of ordinary skill in the art. 

The invention at issue is a polynucleotide corresponding to a gene that is expressed in 
human tissue. The claimed invention has numerous practical, beneficial uses in toxicology 
testing, drug development, and the diagnosis of disease, none of which requires knowledge of 
how the polypeptide coded for by the polynucleotide actually functions. As a result of the 
benefits of these uses, the claimed invention already enjoys significant commercial success. 
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Appellants previously submitted (in unexecuted form on January 27, 2003 and in 
executed form on February 20, 2003) the FirstDeclaration of Tod Bedilion describing some of 
the practical uses of the claimed invention in gene and protein expression monitoring 
applications. The First Bedilion Declaration demonstrates that the positions and arguments made 
by the Patent Examiner with respect to the utility of the claimed polynucleotide are without 
merit. 

The First Bedilion Declaration describes, in particular, how the claimed expressed 

polynucleotide can be used in gene expression monitoring applications that were well-known at 

the time the patent application was filed, and how those applications are useful in developing 

drugs and monitoring their activity. Dr. Bedilion states that the claimed invention is a useful tool 

when employed as a highly specific probe in a cDNA microarray: 

Persons skilled in the art would appreciate that cDNA microarrays that contained 
the SEQ ID NO:74 polynucleotide would be a more useful tool than cDNA 
microarrays that did not contain the SEQ ID NO:74 polynucleotide in connection 
with conducting gene expression monitoring studies on proposed (or actual) drugs 
for treating immune responses and cancers for such purposes as evaluating their 
efficacy and toxicity. (First Bedilion Declaration, f 15.) 

Appellants further submit three additional expert Declarations under 37 C.F.R. § 1.132, 
with respective attachments, and ten (10) scientific references filed before or shortly after the 
June 6, 1997 priority date of the instant application. 

The First Bedilion Declaration, Rockett Declaration, Iyer Declaration, Second Bedilion 

Declaration, and the references fully establish that, prior to the June 6, 1997 filing date of the 

parent Lai '870 application, it was well-established in the art that: 

polynucleotides derived from nucleic acids expressed in one or more 
tissues and/or cell types can be used as hybridization probes - that is, as tools - 
to survey for and to measure the presence, the absence, and the amount of 
expression of their cognate gene; 

with sufficient length, at sufficient hybridization stringency, and with 
sufficient wash stringency - conditions that can be routinely established - 
expressed polynucleotides, used as probes, generate a signal that is specific to the 
cognate gene, that is, produce a gene-specific expression signal; 

expression analysis is useful, inter alia, in drug discovery and lead 
optimization efforts, in toxicology, particularly toxicology studies conducted early 
in drug development efforts, and in phenotypic characterization and 
categorization of cell types, including neoplastic cell types; 
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each additional gene-specific probe used as a tool in expression analysis 
provides an additional gene-specific signal that could not otherwise have been 
detected, giving a more comprehensive, robust, higher resolution, statistically 
more significant, and thus more useful expression pattern in such analyses than 
would otherwise have been possible; 

biologists, such as toxicologists, recognize the increased utility of more 
comprehensive, robust, higher resolution, statistically more significant results and 
thus want each newly identified expressed gene to be included in such an 
analysis; 



nucleic acid microarrays increase the parallelism of expression 
measurements, providing expression data analogous to that provided by older 
lower throughput techniques, but at substantially increased throughput; 

accordingly, when expression profiling is performed using microarrays 
each additional gene-specific probe that is included as a signaling component on 
this analytical device increases the detection range, and thus versatility of this 
research tool; 

biologists, such as toxicologists, recognize the increased utility of such 
improved tools, and thus want a gene-specific probe to each newly identified 
expressed gene to be included in such an analytical device; 

the industrial suppliers of microarrays recognize the increased utility of 
such improved tools to their customers, and thus strive to improve salability of 
their microarrays by adding each newly identified expressed gene to the 
microarrays they sell; 

it is not necessary that the biological function of a gene be known for 
measurement of its expression to be useful in drug discovery and lead 
optimization analyses, toxicology, or molecular phenotyping experiments; 

failure of a probe to detect changes in expression of its cognate gene does 
not diminish the usefulness of the probe as a research tool; and 

failure of a probe completely to detect its cognate transcript in any single 
expression analysis experiment does not deprive the probe of usefulness to the 
community of users who would use it as a research tool. 



Appellants file herewith: 

1. the Declaration of John C. Rockett, Ph.D., under 37 C.F.R. § 1.132, with Exhibits 
A-Q (hereinafter the "Rockett Declaration"); 

2. the Second Declaration of Tod Bedilion, Ph.D., under 37 C.F.R. § 1.132 
(hereinafter the "Second Bedilion Declaration"); 
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3. the Declaration of Vishwanath R. Iyer, Ph.D., under 37 C.F.R. § 1. 132 with 
Exhibits A-E (hereinafter the "Iyer Declaration"); and _ 

4. ten (10) references published before or shortly after the June 6, 1997 filing date of 
the priority Lai '870 application,: 

a) PCT application WO 95/21944, SmithKline Beecham Corporation, 
Differentially expressed genes in healthy and diseased subjects (August 17, 1995) (Reference 
No. 1) 

b) PCT application WO 95/2068 1 , Incyte Pharmaceuticals, Inc., Comparative 
gene transcript analysis (August 3, 1995) (Reference No. 2) 

c) M. Schena et al., Quantitative monitoring of gene expression patterns with 
a complementary DNA microarray, Science 270:467-470 (October 20, 1995) (Reference No. 3) 

d) PCT application WO 95/35505, Stanford University, Method and 
apparatus for fabricating microarrays of biological samples (December 28, 1995) (Reference No. 
4) 

e) U.S. Pat. No. 5,569,588, M. Ashby et al., Methods for drug screening 
(October 29, 1996) (Reference No. 5) 

f) R. A. Heller al., Discovery and analysis of inflammatory disease-related 
genes using cDNA microarrays, Proc. Natl. Acad. Sci. USA 94:2150 - 2155 (March 1997) 
(Reference No. 6) 

g) PCT application WO 97/13877, Lynx Therapeutics, Inc., Measurement of 
gene expression profiles in toxicity determinations (April 17, 1997) (Reference No. 7) 

h) Acacia Biosciences Press Release (August 11, 1997) (Reference No. 8) 

i) V. Glaser, Strategies for Target Validation Streamline Evaluation of 
Leads, Genetic Engineering News (September 15, 1997) (Reference No. 9) 

j) J. L. DeRisi et al., Exploring the metabolic and genetic control of gene 
expression on a genomic scale, Science 278:680 - 686 (October 24, 1997) (Reference No. 10) 

The law has never required knowledge of biological function to prove utility. It is the 
claimed invention's uses, not its functions, that are the subject of a proper analysis under the 
utility requirement. 

In any event, as demonstrated by the First Bedilion Declaration, the Rockett Declaration, 
the Iyer Declaration, and the Second Bedilion Declaration, the person of ordinary skill in the art 
can achieve beneficial results from the claimed polynucleotide in the absence of any knowledge 
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as tothe precise function of the protein encoded by it. The uses of the claimed polynucleotide i 
gene expression monitoring applications are in fact independent of its precise biological 
function. 



The Final Office Action is replete with arguments made and positions taken for the first 
time in a misplaced attempt to justify the rejections of the claims under 35 U.S.C. §§ 101 and 
112. This is particularly so with respect to the substantial, specific and credible utilities 
disclosed in the Lai '870 application relating to the use of the SEQ ID NO:74 polynucleotide for 
gene expression monitoring applications. Such gene expression monitoring applications are 
highly useful in drug development and in toxicity testing. 

The Final Office Action's new positions and arguments include that gene expression 
monitoring results obtained using the claimed polynucleotide as a control would allegedly be 
"uninformative," allegedly "would not add any information to a gene expression or toxicology 
assay regarding the response of the cell to the drug or toxic chemical," or are otherwise 
insufficient to constitute substantial, specific and credible utilities for the claimed polynucleotide 
(Final Office Action, e.g., page 15). In addition, the Final Office Action asserts that the 
Declaration of Dr. Tod Bedilion is insufficient to overcome the rejections, because it allegedly 
"does not present any concrete evidence for a specific and substantial utility for the claimed 
polynucleotides or the polypeptides encoded therefrom, and does not add any evidence to the 
instant disclosure regarding the specific and substantial utility of the claimed polynucleotide." 
(Final Office Action, page 11.) 

Under the circumstances, Appellants are submitting with this Appeal Brief the 
Declaration of John C. Rockett, Ph.D., under 37 C.F.R. § 1.132, with attached Exhibits A - Q; 
the Declaration of Vishwanath R. Iyer, Ph.D., under 37 C.F.R. § 1 . 132 with attached Exhibits A- 
E; the Second Declaration of Tod Bedilion, Ph.D., under 37 C.F.R. § 1.132; and ten references 
published before or shortly after the June 6, 1997 priority date of the instant application. As we 
will show, the Rockett Declaration, the Iyer Declaration, the Second Bedilion Declaration, and 
the accompanying references show the many substantial reasons why the Examiner's new 
positions and arguments with respect to the use of the claimed SEQ ID NO:74 polynucleotide in 
gene expression monitoring applications are without merit. 

The fact that the Rockett, Iyer, and Second Bedilion Declarations, along with the 
accompanying references, are being submitted in response to positions taken and arguments 
made for the first time in the Final Office Action, including arguments disregarding the 
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persuasiveness of the first Bedilion Declaration, constitutes by itself "good and sufficient 
'"reasons" under 37 C.F.R. §1.195 why these Declarations and references were not earlier 
submitted and should be admitted at this time. Appellants also note that the submitted 
Declarations and references are responsive to the new utility rejection as framed by the Board of 
Appeals in copending cases with similar issues. 

I. The applicable legal standard 

To meet the utility requirement of sections 101 and 1 12 of the Patent Act, the patent 

applicant need only show that the claimed invention is "practically useful," Anderson v. Natta, 

480 F.2d 1392, 1397, 178 USPQ 458 (CCPA 1973) and confers a "specific benefit" on the 

public. Brenner v. Manson, 383 U.S. 519, 534-35, 148 USPQ 689 (1966). As discussed in a 

recent Court of Appeals for the Federal Circuit case, this threshold is not high: 

An invention is "useful" under section 101 if it is capable of providing 
some identifiable benefit. See Brenner v. Manson, 383 U.S. 519, 534 [148 USPQ 
689] (1966); Brooktree Corp. v. Advanced Micro Devices, Inc., 977 F 2d 1555 
1571 [24 USPQ2d 1401] (Fed. Cir. 1992) ("to violate Section 101 the claimed' 
device must be totally incapable of achieving a useful result"); Fuller v. Berger, 
120 F. 274, 275 (7th Cir. 1903) (test for utility is whether invention "is incapable 
of serving any beneficial end"). Juicy Whip Inc. v. Orange Bang Inc 51 
USPQ2d 1700 (Fed. Cir. 1999). 

While an asserted utility must be described with specificity, the patent applicant need not 
demonstrate utility to a certainty. In Stiftung v. Renishaw PLC, 945 F.2d 1 173, 1 180, 20 
USPQ2d 1094 (Fed. Cir. 1991), the United States Court of Appeals for the Federal Circuit 
explained: 

An invention need not be the best or only way to accomplish a certain 
result, and it need only be useful to some extent and in certain applications: 
"[T]he fact that an invention has only limited utility and is only operable in 
certain applications is not grounds for finding lack of utility." Envirotech Corp. v 
Al George, Inc., 730 F.2d 753, 762, 221 USPQ 473, 480 (Fed. Cir. 1984). 

The specificity requirement is not, therefore, an onerous one. If the asserted utility is 
described so that a person of ordinary skill in the art would understand how to use the claimed 
invention, it is sufficiently specific. See Standard Oil Co. v. Montedison, S.p.a., 212 U.S.P.Q. 
327, 343 (3d Cir. 1981). The specificity requirement is met unless the asserted utility amounts to 
a "nebulous expression" such as "biological activity" or "biological properties" that does not 
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convey meaningful information about the utility of what is being claimed. Cross v. Iizuka, 753 
'■"T^dTW/l'iM8^VCir?1985).~ " _ 

In addition to conferring a specific benefit on the public, the benefit must also be 
"substantial." Brenner, 383 U.S. at 534. A "substantial" utility is a practical, "real-world" 
utility. Nelson v. Bowler, 626 F.2d 853, 856, 206 USPQ 881 (CCPA 1980). 

If persons of ordinary skill in the art would understand that there is a "well-established" 
utility for the claimed invention, the threshold is met automatically and the applicant need not 
make any showing to demonstrate utility. Manual of Patent Examining Procedure at § 706.03(a). 
Only if there is no "well-established" utility for the claimed invention must the applicant 
demonstrate the practical benefits of the invention. Id. 

Once the patent applicant identifies a specific utility, the claimed invention is presumed 
to possess it. In re Cortright, 165 F.3d 1353, 1357, 49 USPQ2d 1464 (Fed. Cir. 1999); In re 
Brana, 51 F.3d 1560, 1566; 34 USPQ2d 1436 (Fed. Cir. 1995). In that case, the Patent Office 
bears the burden of demonstrating that a person of ordinary skill in the art would reasonably 
doubt that the asserted utility could be achieved by the claimed invention. Id. To do so, the 
Patent Office must provide evidence or sound scientific reasoning. See In re Longer, 503 F.2d 
1380, 1391-92, 183 USPQ 288 (CCPA 1974). If and only if the Patent Office makes such a 
showing, the burden shifts to the applicant to provide rebuttal evidence that would convince the 
person of ordinary skill that there is sufficient proof of utility. Brana, 51 F.3d at 1566. The 
applicant need only prove a "substantial likelihood" of utility; certainty is not required. Brenner, 
383 U.S. at 532. 

II. Uses of the claimed polynucleotide for diagnosis of conditions and disorders 

characterized by expression of NHRP, for toxicology testing, and for drug discovery 
are sufficient utilities under 35 U.S.C. §§ 101 and 112, first paragraph 

The claimed invention meets all of the necessary requirements for establishing a credible 
utility under the Patent Law: There are "well-established" uses for the claimed invention known 
to persons of ordinary skill in the art, and there are specific practical and beneficial uses for the 
invention disclosed in the patent application's specification. These uses are explained, in detail, 
in the previously submitted First Bedilion Declaration, and in the Rockett Declaration, Iyer 
Declaration, and Second Bedilion Declaration accompanying this brief. Objective evidence, not 
considered by the Patent Office, further corroborates the credibility of the asserted utilities. 
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A * , ThC "f ° f thC C,aimed P 0, y nu deo«de for toxicology testing, drug discovery, 
and disease diagnosis are^ractical uses that confer "specificbenefits" to the public 

The claimed invention has specific, substantial, real-world utility by virtue of its use in 
toxicology testing, drug development and disease diagnosis through gene expression profiling. 
These uses are explained in detail in the previously submitted First Bedilion Declaration, and in 
the accompanying Rockett Declaration, Iyer Declaration, and Second Bedilion Declaration. The 
claimed invention is a useful tool in cDNA microarrays used to perform gene expression 
analysis. That is sufficient to establish utility for the claimed polynucleotide. 

The instant application is a continuation application of and claims priority to United 
States patent application Serial No. 08/870,870 filed on June 6, 1997 (hereinafter "the Lai '870 
application"), having essentially the identical specification, with the exception of corrected 
typographical errors and reformatting changes. Thus page and line numbers may not match as 
between the Lai '506 application and the Lai '870 application. 

In his First Declaration, Dr. Bedilion explains the many reasons why a person skilled in 
the art reading the Lai '870 application on June 6, 1997 would have understood that application 
to disclose the claimed polynucleotide to be useful for a number of gene expression monitoring 
applications, e.g., as a highly specific probe for the expression of that specific polynucleotide in 
connection with the development of drugs and the monitoring of the activity of such drugs. (First 
Bedilion Declaration at, e.g., 10-15). Much, but not all, of Dr. Bedilion's explanation 
concerns the use of the claimed polynucleotide in cDNA microarrays of the type first developed 
at Stanford University for evaluating the efficacy and toxicity of drugs, as well as for other 
applications. (First Bedilion Declaration, f|[ 12 and 15). 1 

In connection with his explanations, Dr. Bedilion states that the "Lai '870 application 
would have led a person skilled in the art on June 6, 1997 who was using gene expression 
monitoring in connection with working on developing new drugs for the treatment of immune 
responses and cancers to conclude that a cDNA microarray that contained the SEQ ID NO:74 
polynucleotide would be a highly useful tool and to request specifically that any cDNA 
microarray that was being used for such purposes contain the SEQ ID NO:74 polynucleotide." 
(First Bedilion Declaration, <fl 15 ). For example, as explained by Dr. Bedilion, "[p]ersons skilled 
in the art would [have appreciated on June 6, 1997] that cDNA microarrays that contained the 

'870 fr!?r 0n , a,S0 e ,, XpIa / ned 1 - for exam P' e - whv P ers °™ skilled in the art would also appreciate, based on the Lai 
t ~LZ T f C ' a,m f d P ol y nuc,eotide w °"»<» be useful in connection with developing new drugs us L 
^^r^T ana,ySiS ' that Predated ^ ^ thC deVe '°P ment 0f the cD^technoS £2 
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SEQ ID NO:74 polynucleotide would be a more useful tool than cDNA microarrays that did not 
contain the SEQ ID NO:74 polynucleotide in connection with conducting gene expression 
monitoring studies on proposed (or actual) drugs for treating immune responses and cancers for 
such purposes as evaluating their efficacy and toxicity." Id. 

In support of those statements, Dr. Bedilion provided detailed explanations of how cDNA 
technology can be used to conduct gene expression monitoring evaluations, with extensive 
citations to pre-June 6, 1997 publications showing the state of the art on June 6, 1997. (First 
Bedilion Declaration, ff 10-14). While Dr. Bedilion's explanations in paragraph 15 of his 
Declaration include almost three pages of text and six subparts (a)-(f), he specifically states that 
his explanations are not "all-inclusive." Id. For example, with respect to toxicity evaluations, 
Dr. Bedilion had earlier explained how persons skilled in the art who were working on drug 
development on June 6, 1997 (and for several years prior to June 6, 1997) "without any doubt" 
appreciated that the toxicity (or lack of toxicity) of any proposed drug was "one of the most 
important criteria to be considered and evaluated in connection with the development of the 
drug" and how the teachings of the Lai '870 application clearly include using differential gene 
expression analyses in toxicity studies (First Bedilion Declaration, 1 10). 

Thus, the First Bedilion Declaration establishes that persons skilled in the art reading the 
Lai '870 application at the time it was filed "would have wanted their cDNA microarray to have 
a [SEQ ID NO:74 polynucleotide probe] because a microarray that contained such a probe (as 
compared to one that did not) would provide more useful results in the kind of gene expression 
monitoring studies using cDNA microarrays that persons skilled in the art have been doing since 
well prior to June 6, 1997." (First Bedilion Declaration, ^[ 15, item (f).) This, by itself, provides 
more than sufficient reason to compel the conclusion that the Lai '870 application disclosed to 
persons skilled in the art at the time of its filing substantial, specific and credible real-world 
utilities for the claimed polynucleotide. 

In his Declaration, Dr. Rockett explains the many reasons why a person skilled in the art 
in 1997 would have understood that any expressed polynucleotide is useful for a number of gene 
expression monitoring applications, e.g., in cDNA microarrays, in connection with the 
development of drugs and the monitoring of the activity of such drugs. (Rockett Declaration at, 
e.g., n 10-18). 



It is my opinion, therefore, based on the state of the art in toxicology at 
least since the mid-1990s . . . that disclosure of the sequence of a new gene or 
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protein, with or without knowledge of its biological function, would have been 

sufficient-information for a toxicologist to use the'gene and/or protein in 

expression profiling studies in toxicology. [Rockett Declaration7<][ 18.] 2 

In his Second Declaration, Dr. Bedilion explains why a person of skill in the art in [year 
of filing] would have understood that any expressed polynucleotide is useful for gene expression 
monitoring applications using cDNA microarrays. (Second Bedilion Declaration, e.g., ffl 4-7.) 
In his Declaration, Dr. Iyer explains why a person of skill in the art in 1997 would have 
understood that any expressed polynucleotide is useful for gene expression monitoring 
applications using cDNA microarrays, stating that "[t]o provide maximum versatility as a 
research tool, the microarray should include - and as a biologist I would want my microarray to 
include - each newly identified gene as a probe." (Iyer Declaration, f 9.) 

In addition, Dr. Rockett explains in his Declaration that "there are a number of other 
differential expression analysis technologies that precede the development of microarrays, some 
by decades, and that have been applied to drug metabolism and toxicology research, including: 
(1) differential screening; (2) subtractive hybridization, including variants such as chemical 
cross-linking subtraction, suppression-PCR subtractive hybridization and representational 
difference analysis; (3) differential display; (4) restriction endonuclease facilitated analyses, 
including serial analysis of gene expression (SAGE) and gene expression fingerprinting and (5) 
EST analysis." (Rockett Declaration, <J 7.) 

Nowhere does the Patent Examiner address the fact that, as described on e.g., at pages 14, 
lines 21-23, page 56, lines 15-19, page 58, line 8 through page 59, line 28, and page 67, line 21 
through page 68, line 14 of the Lai '506 application, the claimed polynucleotide can be used as a 
highly specific probe in, for example, cDNA microarrays - a probe that without question can be 
used to measure both the existence and amount of complementary RNA sequences known to be 
the expression products of the claimed polynucleotide. The claimed invention is not, in that 
regard, some random sequence whose value as a probe is speculative or would require further 
research to determine. 

Given the fact that the claimed polynucleotide is known to be expressed, its utility as a 
measuring and analyzing instrument for expression levels is as indisputable as a scale's utility for 
measuring weight. This use as a measuring tool, regardless of how the expression level data 
ultimately would be used by a person of ordinary skill in the art, by itself demonstrates that the 



2 



"Use of the words 'it is my opinion' to preface what someone of ordinary skill in the art would have known does 

?«o a I 1 f; 0 , rm f3CtUal statements contained in the declaration into opinion testimony." In re Alton, 37 USP02d 
1578, 1583 (Fed. Cir. 1996). 
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claimed invention provides an identifiable, real-world benefit that meets the utility requirement. 
'^a^eon v/Roper^'li ¥.2d 951,~(Fed. Cir. 1983) (claimed invention need only meet one of its 
stated objectives to be useful); In re Cortwright, 165 F.3d 1353, 1359 (Fed. Cir. 1999) (how the 
invention works is irrelevant to utility); MPEP § 2107 ("Many research tools such as gas 
chromatographs, screening assays, and nucleotide sequencing techniques have a clear, specific, 
and unquestionable utility (e.g., they are useful in analyzing compounds) " (emphasis added)). 

The First Bedilion Declaration shows that a number of pre-June 6, 1997 publications 
confirm and further establish the utility of cDNA microarrays in a wide range of drug 
development gene expression monitoring applications at the time the Lai '870 application was 
filed (First Bedilion Declaration fJJ 10-14; First Bedilion Exhibits A-G). Indeed, Brown and 
Shalon U.S. Patent No. 5,807,522 (the Brown '522 patent, First Bedilion Exhibit D), which 
issued from a patent application filed in June 1995 and was effectively published on December 
29, 1995 as a result of the publication of a PCT counterpart application, shows that the Patent 
Office recognizes the patentable utility of the cDNA technology developed in the early to mid- 
1990s. As explained by Dr. Bedilion, among other things (First Bedilion Declaration, f 12): 

The Brown '522 patent further teaches that the "[m]icroarrays of 
immobilized nucleic acid sequences prepared in accordance with the invention" 
can be used in "numerous" genetic applications, including "monitoring of gene 
expression" applications (see Bedilion Tab D at col. 14, lines 36-42). The Brown 
'522 patent teaches (a) monitoring gene expression (i) in different tissue types, (ii) 
in different disease states, and (iii) in response to different drugs, and (b) that ' 
arrays disclosed therein may be used in toxicology studies (see First Bedilion Tab 
D at col. 15, lines 13-18 and 52-58; and col. 18, lines 25-30). 



Literature reviews published after the filing of the priority Lai '870 application describing 
the state of the art further confirm the claimed invention's utility. Rockett et al. confirm, for 
example, that the claimed invention is useful for differential expression analysis regardless of 
how expression is regulated: 

Despite the development of multiple technological advances which have 
recently brought the field of gene expression profiling to the forefront of 
molecular analysis, recognition of the importance of differential gene expression 
and characterization of differentially expressed genes has existed for many years. 

* * * 

Although differential expression technologies are applicable to a broad 
range of models, perhaps their most important advantage is that, in most cases, 
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absolutely no prior knowledge of the specific genes which are up- or down- 
— -regulated-isrequired: - 

* * * 

Whereas it would be informative to know the identity and functionality of 
all genes up/down regulated by . . . toxicants, this would appear a longer term 

goal However, the current use of gene profiling yields a pattern of gene 

changes for a xenobiotic of unknown toxicity which may be matched to that of 
well characterized toxins, thus alerting the toxicologist to possible in vivo 
similarities between the unknown and the standard, thereby providing a platform 
for more extensive toxicological examination, (emphasis in original) 

Rockett et al., Differential gene expression in drug metabolism and toxicology: 
practicalities, problems an d potential . Xenobiotica 29:655-691 (July 1999) (Rockett Declaration, 
Exhibit C). 

In another post-June 6, 1997 article, Lashkari et al. state explicitly that sequences that are 
merely "predicted" to be expressed (predicted Open Reading Frames, or ORFs) - the claimed 
invention in fact is known to be expressed - have numerous uses: 

Efforts have been directed toward the amplification of each predicted ORF 
or any other region of the genome ranging from a few base pairs to several 
kilobase pairs. There are many uses for these amplicons- they can be cloned into 
standard vectors or specialized expression vectors, or can be cloned into other 
specialized vectors such as those used for two-hybrid analysis. The amplicons 
can also be used directly by, for example, arraying onto glass for expression 
analysis, for DNA binding assays, or for any direct DNA assay, (emphasis added) 

Lashkari et al., Whole gen ome analysis: Experimental access to all genome sequenced 
segments through larger -scale efficient oligonucleotide synthesis and PCR . Proc. Nat. Acad. Sci. 
U.S.A. 94:8945-8947 (Aug. 1997) (Reference No. 11). 

B. The use of polynucleotides coding for polypeptides expressed by humans as 
tools for toxicology testing, drug discovery, and the diagnosis of disease is now 
"well-established" 

The technologies made possible by expression profiling and the DNA tools upon which 
they rely are now well-established. The technical literature recognizes not only the prevalence of 
these technologies, but also their unprecedented advantages in drug development, testing and 
safety assessment. These technologies include toxicology testing, e.g., as described by Bedilion, 
Rockett, and Iyer in their Declarations. 
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Toxicology testing is now standard practice in the pharmaceutical industry. See, e.g., 
John C. Rockett et al., supra: _ 

Knowledge of toxin-dependent regulation in target tissues is not solely an 
academic pursuit as much interest has been generated in the pharmaceutical 
industry to harness this technology in the early identification of toxic drug 
candidates, thereby shortening the developmental process and contributing 
substantially to the safety assessment of new drugs. (Rockett Declaration, Exhibit 
C, page 656) 

To the same effect are several other scientific publications, including Emile F. Nuwaysir 
et Microarrays and toxicology: The advent of toxicnp enomir. Molecular Carcinogenesis 
24:153-159 (1999) (Reference No. 12); Sandra Steiner and N. Leigh Anderson, Expression 
profiling in toxicology -- potentials and limitations, Toxicology Letters 112-13:467-471 (2000) 
(Reference No. 13). 

Nucleic acids useful for measuring the expression of whole classes of genes are routinely 
incorporated for use in toxicology testing. Nuwaysir et al. describes, for example, a Human 
ToxChip comprising 2089 human clones, which were selected 

for their well-documented involvement in basic cellular processes as well 
as their responses to different types of toxic insult. Included on this list are DNA 
replication and repair genes, apoptosis genes, and genes responsive to PAHs and 
dioxin-like compounds, peroxisome proliferators, estrogenic compounds, and 
oxidant stress. Some of the other categories of genes include transcription factors, 
oncogenes, tumor suppressor genes, cyclins, kinases, phosphatases, cell adhesion ' 
and motility genes, and homeobox genes. Also included in this group are 84 
housekeeping genes, whose hybridization intensity is averaged and used for signal 
normalization of the other genes on the chip. 

See also Table 1 of Nuwaysir et al. (listing additional classes of genes deemed to be of 
special interest in making a human toxicology microarray). 

The more genes that are available for use in toxicology testing, the more powerful the 
technique. "Arrays are at their most powerful when they contain the entire genome of the 
species they are being used to study." John C. Rockett and David J. Dix, Application of DNA 
arrays to toxicology , Environ. Health Perspec. 107:681-685 (1999) (Reference No. 14). Control 
genes are carefully selected for their stability across a large set of array experiments in order to 
best study the effect of toxicological compounds. See attached email from the primary 
investigator on the Nuwaysir paper, Dr. Cynthia Afshari, to an Incyte employee, dated July 3, 
2000, as well as the original message to which she was responding (Reference No. 15), 
indicating that even the expression of carefully selected control genes can be altered. Thus, there 
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is no expressed gene which is irrelevant to screening for toxicological effects, and all expressed 
genes have a utility for toxicological screening. _ 

Further evidence of the well-established utility of all expressed polypeptides and 
polynucleotides in toxicology testing is found in U.S. Pat. No. 5,569,588 (Reference No. 5) and 
published PCT applications WO 95/21944 (Reference No. 1), WO 95/20681 (Reference No. 2), 
and WO 97/13877 (Reference No. 7). 

WO 95/21944 ("Differentially expressed genes in healthy and diseased subjects"), 
published August 17, 1995, describes the use of microarrays in expression profiling analyses, 
emphasizing that patterns of expression can be used to distinguish healthy tissues from diseased 
tissues and that patterns of expression can additionally be used in drug development and 
toxicology studies, without knowledge of the biological function of the encoded gene product. 
In particular, and with emphasis added: 

The present invention involves . . . methods for diagnosing diseases 
characterized by the presence of [differentially expressed] . . . genes, despite the 
absence of knowledge a bout the gene or its function . The methods involve the 
use of a composition suitable for use in hybridization which consists of a solid 
surface on which is immobilized at pre-defined regions thereon a plurality of 
defined oligonucleotide/ polynucleotide sequences for hybridization. Each 

sequence comprises a fragment of an EST Differences in hybridization 

P atterns produced through use of this composition and the specified methods 
enable diagnosis of disease s based on differential expression of genes of unknown 
function [abstract] 

The method [of the present invention] involves producing and comparing 
hybridization patterns formed between samples of expressed mRNA or cDNA 
polynucleotide sequences ... and a defined set of 

oligonucleotide/polynucleotide[] . . . immobilized on a support. Those defined 
[immobilized] oligonucleotide/polynucleotide sequences are representative of the 
total expressed genetic c omponent of the cells , tissues, organs or organism as 
defined by the collection of partial cDNA sequences (ESTs). [page 2] 

The present invention meets the unfilled needs in the art by providing 
methods for the . . . use of gene fragments and genes, even those of unknown full 
length sequence and un known function, which are differentially expressed in a 
healthy animal and in an animal having a specific disease or infection by use of 
ESTs derived from DNA libraries of healthy and/or diseased/infected animals, 
[page 4] 

Yet another aspect of the invention is that it provides ... a means for . . . 
monitoring the efficacy of disease treatment regimes including . . . toxicological 
effects thereof ." [page 4] & 
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It has been appreciated that one or more differentially identified EST or 

gene^specificuligonucleotrde/polynucleotides define a pattern of differentially 

expressed genes diagnostic of a predisease, disease or infective state A 
knowledge of the specific biological function of the EST is not required only that 
the EST[] identifies a gene or genes whose altered expression is associated 
reproducibly with the predisease, disease or infectious state, [page 4] 

As used herein, the term 'disease' or 'disease state' refers to any condition 
which deviates from a normal or standardized healthy state in an organism of the 
same species in terms of differential expression of the organism's genes 
[whether] of genetic or environmental origin, for example, an inherited disorder 
such as certain breast cancers. . . .[or] administration of a drug or exposure of the 
animal to another agent, e.g., nutrition, which affects gene expression, [page 5] 

As used herein, the term 'solid support' refers to any known substrate 
which is useful for the immobilization of large numbers of 
oligonucleotide/polynucleotide sequences by any available method . . [and 
includes, inter alia,] nitrocellulose, . . . glass, silica. . . . [page 6] 

By 'EST' or 'Expressed Sequence Tag' is meant a partial DNA or cDNA 
sequence of about 150 to 500, more preferably about 300, sequential nucleotides 
• • [page 6] 

One or more libraries made from a single tissue type typically provide at 
least about 3000 different (i.e., unique) ESTs and potentially the full complement 
of all possible ESTs representing all cDNAs e.g.. 50,000 -100.000 in an animal 
such as a human , [page 7] " 

The lengths of the defined oligonucleotide/ polynucleotides may be 
readily increased or decreased as desired or needed. . . . The length is g enerally 
guided by the principle that it should h e of sufficient le ngth to insure that it is nnf] 
average only represented o n ce in the p o pulation to he examined [ pa ge 7] 

Comparing the . . . hybrid ization p atterns permits detection of those 
defined oligonucleotide/ polynucleotides which are differentially expressed 
between the healthy control and the disease sample by the presence of differences 
m the hybridization patterns at pre-defined regions [of the solid support], [page 

1 J\ 

It should be appreciated that one does not have to be restricted in using 
ESTs from a particular tissue from which probe RNA or cDNA is obtained[;] 
rather any or all ESTs (known or unknow n) mav be p la ced on the sup port 
Hybridization will be used Tto l form diagnostic p attern or to identify which 
particular EST is detected. For example, all known ESTs from an organism are 
used to produce a 'master' solid support to which control sample and disease 
samples are alternately hybridized, [page 14] 

. Diagnosis is accomplished by comparing the two hybridization p atterns 
wherein substantial differences between the first and second hybridization 
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patterns indicate the presence of the selected disease or infection in the animal 

— - being tested. Substantially similar first and second hybridization patterns indicate 

the absence of disease or infection. This[,] like many of the foregoing 
embodiments!,] may use known or unknown ESTs derived from many libraries 
[page 18] 

Still another intriguing use of this method is in the area of monitoring the 
effects of drugs on gene expression , both in laboratories and during clinical trials 
with animal[s], especially humans, [page 18] 

WO 95/20681 ("Comparative Gene Transcript Analysis"), filed in 1994 by Appellants' 
assignee and published August 3, 1995, has three issued U.S. counterparts: U.S. Pat. Nos. 
5,840,484, issued November 24, 1998; 6,114,114, issued September 5, 2000; and 6,303,297, 
issued October 16, 2001. 

The specification describes the use of transcript expression patterns, or "images", each 
comprising multiple pixels of gene-specific information, for diagnosis, for cellular phenotyping, 
and in toxicology and drug development efforts. The specification describes a plurality of 
methods for obtaining the requisite expression data -- one of which is microarray hybridization - 
and equates the uses of the expression data from these disparate platforms. In particular, and 
with emphasis added: 

The invention provides a "method and system for quantifying the relative 

abundance of gene transcripts in a biological specimen [G]ene transcript 

imaging can be used to detect or diagnose a particular biological state, disease, or 
condition which is correlated to the relative abundance of gene transcripts in a 
given cell or population of cells. The invention provides a method for comparing 
the gene tra nscript image analysis from two or more different biological 
specimens in order to distinguish between the two specimens and identify one or 
more genes which are differentially expressed between the two specimens, 
[abstract] 

[Wle see each indiv idual gene product as a 'pixel' of information, which 
relates to th e expression of that, and only that, gene. We teach herein [] methods 
whereby the individual 'pixels' of gene expression information can be combined 
into a singl e gene transcript 'image.' in which each of the individual genes can be 
visualized simultaneously and allowing relationships between the gene pixels to 
be easily visualized and understood, [page 2] 

The present invention avoids the drawbacks of the prior art by providing a 
method to qua ntify the relative abundance of multiple gene transcripts in a given 
biological specimen. . . . The method of the instant invention provides for detailed 
diagnostic comparisons of cell profiles revealing numerous changes in the 
expression of individual transcripts, [page 6] 
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High resolution analysis of gene expression be used directly as a 
. — diagnostic profited -v. [page 7] 

The method is particularly powerful when more than 100 and preferably 
more than 1,000 gene transcripts are analyzed, [page 7] 

The invention . . . includes a method of comparing specimens containing 
gene transcripts, [page 7] 

The final data values from the first specimen and the further identified 
sequence values from the second specimen are processed to generate ratios of 
transcript sequences, which indicate the differences in the number of gene 
transcripts between the two specimens, [i.e., the results yield analogous data to 
microarrays] [page 8] 

Also disclosed is a method of producing a gene transcript image analysis 
by first obtaining a mixture of mRNA, from which cDNA copies are made, [page 
8] 

In a further embodiment, the relative abundance of the gene transcripts in 
one cell type or tissue is compared with the relative abundance of gene transcript 
numbers in a second cell type or tissue in order to identify the differences and 
similarities, [page 9] 

In essence, the invention is a method and system for quantifying the 
relative abundance of gene transcripts in a biological specimen. The invention 
provides a method for comparing the gene transcript image from two or more 
different biological specimens in order to distinguish between the two specimens. 
■ . . [page 9] 

[T]wo or more gene transcript images can be compared and used to detect 
or diagnose a particular biological state, disease, or condition which is correlated 
to the relative abundance of gene transcripts in a given cell or population of cells, 
[pages 9-10] 

The present invention provides a method to compare the relative 

abundance of gene transcripts in different biological specimens This process 

is denoted herein as gene transcript imaging. The quantitative analysis of the 
relative abundance for a set of gene transcripts is denoted herein as 'gene 
transcript image analysis' or 'gene transcript frequency analysis' . The present 
invention allows one to obtain a profile for gene transcription in any given 
population of cells or tissue from any type of organism , [page 11] 

The invention has significant advantages in the fields of diagnostics, 
toxicology and pharmacology , to name a few. [page 12] 

[G]ene transcript sequence abundances are compared against reference 
database sequence abundances including normal data sets for diseased and healthy 
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patients. The patient has the disease(s) with which the patient's data set most 
-- closely correlates- .- fpage 1-2] 

For example, gene transcript frequency analysis can be used to 
differentiate normal cells or tissues from diseased cells or tissues [page 12] 

In toxicology, . . . [g]ene transcript imaging provides highly detailed 
information on the cell and tissue environment, some of which would not be 
obvious in conventional, less detailed screening methods. The gene transcrip t 
image is a more powerful method to predict drug toxicity and efficacy Similar 
benefits accrue in the use of this tool in pharmacology [page 12] 

In an alternative embodiment, comparative gene transcript frequency 
analysis is used to differentiate betwee n cancer cells which respond to anti-cancer 
agents and those which do not respond, [page 12] 

In a further embodiment, comparative gene transcript frequency analysis is 
used . . . for the selection of better pharmacologic animal models." [page 14] 

In a further embodiment, comparative gene transcript frequency analysis is 
used in a clinical setting to give a highly detailed gene transcript profile of a 
diseased state or condition, [page 14] 

An alternate method of prod ucing a gene transcript image includes the 
steps of obtaining a mixtu re of test mRNA and providing a representative array of 
unique probes whose sequences are co mplementary to at least some of the test 
mRNAs. Next, a fixed amount of the t est mRNA is added to the arrayed probes. 
The test mRNA is incubated with the probes for a sufficient time to allow hybrids 
of the test mRNA and probes to form. The mRNA-p ro he hybrids are detected and 
the quantity determined, [page 15] 

[T]his research tool provides a way to get new drugs to the public faster 
and more economically." [page 36] 

In this method, the particular physiologic function of the protein transcrip t 
need not be determined to qualify the g ene transcript as a clinical marker [ pag e 
38] 

[T]he gene transcript changes noted in the earlier rat toxicity study are 
carefully evaluated as clinical markers in the followed patients. Changes in the 
gene transcript image analyses are evaluated as indicators of toxicity bv 

correlation with clinical signs and symptoms and other laboratory results The 

. . . analysis highlights any toxicological changes in the treated patients, [page 39] 



U.S. Pat. No. 5,569,588 ("Methods for Drug Screening") ("the '588 patent"), issued 
October 29, 1996, with a priority date of August 1995, describes an expression profiling 
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platform, the "genome reporter matrix", which is different from nucleic acid microarrays. 

Additionally describing use of nucleic acid microarrays, the f 588 patenLmakes clear that the 

utility of comparing multidimensional expression datasets is independent of the methods by 

which such profiles are obtained. The ( 588 patent speaks clearly to the usefulness of such 

expression analyses in drug development and toxicology, particularly pointing out that a gene's 

failure to change in expression level is a useful result. Thus, with emphasis added, 

The invention provides "[m]ethods and compositions for modeling the 
transcriptional responsiveness of an organism to a candidate drug. . . . [The final 
step of the method comprises] comparing reporter gene product signals for each 
cell before and after contacting the cell with the candidate drug to obtain a drug 
response profile which provides a model of the transcriptional responsiveness of 
said organism to the candidate drug." [abstract] 

The present invention exploits the recent advances in genome science to 
provide for the rapid screening of large numbers of compounds against a systemic 
target comprising substantially all targets in a pathway [or] organism , [col. 1] 

The ensemble of reporting cells comprises as comprehensive a collection 
of transcription regulatory genetic elements as is conveniently available for the 
targeted organism so as to most accurately model the systemic transcriptional 
response. Suitable ensembles generally comprise thousands of individually 
reporting elements; preferred ensembles are substantially comprehensive, i.e. 
provide a transcriptional response diversity comparable to that of the target 
organism. Generally, a substantially comprehensive ensemble requires 
transcription regulatory genetic elements from at least a majority of the 
organism's genes, and preferably includes those of all or nearly all of the genes. 
We term such a substantially comprehensive ensemble a genome reporter matrix, 
[col. 2] 

Drugs often have side effects that are in part due to the lack of target 

specificity [A] genome reporter matrix reveals the spectrum of other genes in 

the genome also affected by the compound. In considering two different 
compounds both of which induce the ERG 10 reporter, if one compound affects 
the expression of 5 other reporters and a second compound affects the expression 
of 50 other reports, the first compound is, a priori, more likely to have fewer side 
effects, [cols. 2-3] 

Furthermore, it is not necessary to know the identity of any of the 
responding genes , [col. 3] 

[A]ny new compound that induces the same response profile as [a] . . . 
dominant tubulin mutant would provide a candidate for a taxol-like 
pharmaceutical, [col. 4] 



The genome reporter matrix offers a simple solution to recognizing new 
specificities in combinatorial libraries. Specifically, pools of new compounds are 
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tested as mixtures across the matrix. If the pool has any new activity not present 
-HHhe original leatrcompoond, new genes are affected among the reporters, [col. 



A sufficient number of different recombinant cells are included to provide 
an ensemble of transcriptional regulatory elements of said organism sufficient to 
model the transcriptional responsiveness of said organism to a drug. In a preferred 
embodiment, the matrix is substantially comprehensive for the selected regulatory 
elements, e.g. essentially all of the gene promoters of the targeted organism are 
included, [cols. 6-7] 

In a preferred embodiment, the basal response profiles are determined 
The resultant electrical output signals are stored in a computer memory as 
genome reporter output signal matrix data structure associating each output signal 
with the coordinates of the corresponding microliter plate well and the stimulus or 
drug. This information is indexed against the matrix to form reference response 
profiles that are used to determine the response of each reporter to any milieu in 
which a stimulus may be provided. After establishing a basal response profile for 
the matrix, each cell is contacted with a candidate drug. The term drug is used 
loosely to refer to agents which can provoke a specific cellular response. The 
drug induces a complex resp ond pattern of repression, silence and induction 
across the matrix . . . .The response profile reflects the cell's transcriptional 

adjustments to maintain homeostasis in the presence of the drug After 

contacting the cells with the candidate drug, the reporter gene product signals 
from each of said cells is again measured to determine a stimulated response 
profile. The basal o[r] background response profile is then compared with the 
stimulated response profile to identify the cellular response profile to the 
candidate drug." [cols. 7-8] 

In another embodiment of t he invention, a matrix H e., arravl of 
hybridization probes corresponding to a pre d etermined population of p enes nf the 
selected organism is used to specifically detect changes in gene transcription 
which result from exposing the selected organism or cells thereof to a candidate 
drug. In this embodiment, one or more cells derived from the organism is 
exposed to the candidate drug in vivo or ex vivo under conditions wherein the 
drug effects a change in gene transcription in the cell to maintain homeostasis 
Thereafter, the gene transcripts, primarily mRNA, of the cell or cells is isolated 
. [and] then contacted with an ordered matrix [array] of hybridization probes, each 
probe being specific for a different one of the transcripts, under conditions where 
each of the transcripts hybridizes with a corre sponding one of the probes to form 
hybridization pairs. The ordered matrix of probes provides, in aggregate, 
complements for an ensemble of genes of the organism sufficient to model the 
transcriptional responsiveness of the organism to a drug. ... The matrix-wide 
signal profile of the drug-sti mulated cells is then compared with a matrix-wiHe 
signal profile of negative control cells to obt a in a specific Hr U g response profile 
[col. 8] K l 
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The invention also provides means for computer-based qualitative analysis 

of-candidate drugs-and unknown compounds. A wide variety of reference 

response profiles may be generated and used in such analyses, "[col. 8] 

Response p rofiles for an unknown stimulus (e.g. new chemicals, unknown 
compounds or unknown mixtures) may be analyzed by comparing the new 
stimulus response profi les with response profiles to known chemical st imuli Icol 

The response profile of a new chemical stimulus may also be compared to 
a known genetic response profile for target gene(s). [col. 9] 

The August 11, 1997 press release from the '588 patent's assignee, Acacia Biosciences 
(now part of Merck) (Reference No. 8 attached hereto), and the September 15, 1997 news report 
by Glaser, "Strategies for Target Validation Streamline Evaluation of Leads," Genetic 
Engineering News (Reference No. 9 attached hereto), attest the commercial value of the methods 
and technology described and claimed in the '588 patent. 

WO 97/13877 ("Measurement of Gene Expression Profiles in Toxicity Determinations"), 

published April 17, 1997, describes an expression profiling technology differing somewhat from 

the use of cDNA microarrays and differing from the genome reporter matrix of the '588 patent; 

but the use of the data is analogous. As per its title, the reference describes use of expression 

profiling in toxicity determinations. In particular, and with emphasis added: 

[T]he invention relates to a method for detecting and monitoring changes 
in gene expression patterns in in vitro and in vivo systems for determining the 
toxicity of drug candidates. [Field of the invention] 

An object of the invention is to provide a new approach to toxicity 
assessment based on an examination of gene expression patterns, or profiles , in in 
vitro or in vivo test systems, [page 3] 

Another object of the invention is to provide a rapid and reliable method 
for correlating gene expression with short term and long term toxicity in test 
animals, [page 3] 

The invention achieves these and other objects by providing a method for 
massively parallel signature sequencing of genes expressed in one or more 
selected tissues of an organism exposed to a test compound. An important feature 
of the invention is the application of novel . . . methodologies that permit the 

formation of gene expression profiles for selected tissues Such profiles mav 

be compared with those from tissues of control organisms at single or multiple 
time points to identify expression patterns predictive of toxicity , [page 3] 

As used herein, the terms 'gene expression profile,' and 'gene expression 
pattern' which is used equivalently, means a frequency distribution of sequences 
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of portions of cDNA molecules sampled from a population of tag-cDNA 
conjugates. : - Preferably, the total number of sequences determined is at least 
1000 ' more preferably, the total n umber of seq u ences determined in a gene 
expression profile is at least ten thousand [page 7] 

The invention provides a method for determining the toxicity of a 
compound by analyzing changes in the gene expression profiles in selected tissues 

of test organisms exposed to the compound Gene expression profiles 

derived from test organisms are compared to gene expression profiles derived 
from control organisms [page 7] 



Therefore, the potential benefit to the public, in terms of lives saved and reduced health 
care costs, are enormous. Evidence of the benefits of this information include: 

• In 1999, CV Therapeutics, an Incyte collaborator, was able to use Incyte gene 
expression technology, information about the structure of a known transporter 
gene, and chromosomal mapping location, to identify the key gene associated 
with Tangier disease. This discovery took place over a matter of only a few 
weeks, due to the power of these new genomics technologies. The discovery 
received an award from the American Heart Association as one of the top 10 
discoveries associated with heart disease research in 1999. 

• In an April 9, 2000, article published by the Bloomberg news service, an Incyte 
customer stated that it had reduced the time associated with target discovery and 
validation from 36 months to 18 months, through use of Incyte' s genomic 
information database. Other Incyte customers have privately reported similar 
experiences. The implications of this significant saving of time and expense for 
the number of drugs that may be developed and their cost are obvious. 

• In a February 10, 2000, article in the Wall Street Journal, one Incyte customer 
stated that over 50 percent of the drug targets in its current pipeline were derived 
from the Incyte database. Other Incyte customers have privately reported similar 
experiences. By doubling the number of targets available to pharmaceutical 
researchers, Incyte genomic information has demonstrably accelerated the 
development of new drugs. 

Because the Patent Examiner failed to address or consider the "well-established" utilities 
for the claimed invention in toxicology testing, drug development, and the diagnosis of disease, 
the Examiner's rejections should be overturned regardless of their merit. 

C. Objective evidence corroborates the utilities of the claimed invention 

There is, in fact, no restriction on the kinds of evidence a Patent Examiner may consider 
in determining whether a "real-world" utility exists. "Real-world" evidence, such as evidence 
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showing actual use or commercial success of the invention, can demonstrate conclusive proof of 
"utility. Raytheon v. Rop'er^llQ USPQ2d 592 (Fed. Cir. 1983); Nestle v^Eugene, 55 F.2d 854, 
856, 12 USPQ 335 (6th Cir. 1932). Indeed, proof that the invention is made, used or sold by any 
person or entity other than the patentee is conclusive proof of utility. United States Steel Corp. 
v. Phillips Petroleum Co., 865 F.2d 1247, 1252, 9 USPQ2d 1461 (Fed. Cir. 1989). 

Over the past several years, a vibrant market has developed for databases containing the 
sequences of all expressed genes (along with the polypeptide translations of those genes), in 
particular genes having medical and pharmaceutical significance such as the instant sequence. 
(Note that the value in these databases is enhanced by their completeness, but each sequence in 
them is independently valuable.) The databases sold by Appellants' assignee, Incyte, include 
exactly the kinds of information made possible by the claimed invention, such as tissue and 
disease associations. Incyte sells its database containing the claimed sequence and millions of 
other sequences throughout the scientific community, including to pharmaceutical companies 
who use the information to develop new pharmaceuticals. 

Both Incyte's customers and the scientific community have acknowledged that Incyte's 
databases have proven to be valuable in, for example, the identification and development of drug 
candidates. Page et al., in discussing the identification and assignment of candidate drug targets, 
state that "rapid identification and assignment of candidate targets and markers represents a huge 
challenge ... [t]he process of annotation is similarly aided by the quantity and richness of the 
sequence specific databases that are currently available, both in the public domain and in the 
private sector (e.g. those supplied by Incyte Pharmaceuticals)" Page, M.J. et al., "Proteomics: a 
major new technology for the drug discovery process," Drug Discov. Today 4:55-62 (1999) 
(Reference No. 16), see page 58, col. 2). As Incyte adds information to its databases, including 
the information that can be generated only as a result of Incyte's invention of the claimed 
polynucleotide and its use of that polynucleotide on cDNA microarrays, the databases become 
even more powerful tools. Thus the claimed invention adds more than incremental benefit to the 
drug discovery and development process. 

Customers can, moreover, purchase the claimed polynucleotide directly from Incyte, 
saving the customer the time and expense of isolating and purifying or cloning the 
polynucleotide for research uses such as those described supra. 

III. The Patent Examiner's rejections are without merit 
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Rather than responding to the evidence demonstrating utility, the Examiner attempts to 
dismiss it altogether "by arguing that the disclosed and well-establishecLutilities for the claimed 
polynucleotide are not "specific, substantial, credible" utilities. (Final Office Action, page 2. 
The Examiner is incorrect both as a matter of law and as a matter of fact. 

A. The precise biological role or function of an expressed polynucleotide is not 
required to demonstrate utility 

The Patent Examiner's ejection of the claimed invention is based partly on the ground 
that, without information as to the precise "biological significance" (Final Office Action, page 2) 
of the claimed invention, the claimed invention's utility is not sufficiently specific. According to 
the Examiner, it is not enough that a person of ordinary skill in the art could use and, in fact, 
would want to use the claimed invention either by itself or in a cDNA microarray to monitor the 
expression of genes for such applications as the evaluation of a drug's efficacy and toxicity. The 
Examiner would require, in addition, that the applicant provide a specific and substantial 
interpretation of the results generated in any given expression analysis. 

It may be that specific and substantial interpretations and detailed information on 
biological function are necessary to satisfy the requirements for publication in some technical 
journals, but they are not necessary to satisfy the requirements for obtaining a United States 
patent. The relevant question is not, as the Examiner would have it, whether it is known how or 
why the invention works, In re Cortwright, 165 F.3d 1353, 1359 (Fed. Cir. 1999), but rather 
whether the invention provides an "identifiable benefit" in presently available form. Juicy Whip 
Inc. v. Orange Bang Inc., 185 F.3d 1364, 1366 (Fed. Cir. 1999). If the benefit exists, and there is 
a substantial likelihood the invention provides the benefit, it is useful. There can be no doubt, 
particularly in view of the First Bedilion Declaration (at, e.g., ffl 10 and 15), that the present 
invention meets this test. 

The threshold for determining whether an invention produces an identifiable benefit is 
low. Juicy Whip, 185 F.3d at 1366. Only those utilities that are so nebulous that a person of 
ordinary skill in the art would not know how to achieve an identifiable benefit and, at least 
according to the PTO guidelines, so-called "throwaway" utilities that are not directed to a person 
of ordinary skill in the art at all, do not meet the statutory requirement of utility. Utility 
Examination Guidelines, 66 Fed. Reg. 1092 (Jan. 5, 2001). 
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Knowledge of the biological function or significance of a biological molecule has never 
been required to show real-world benefit. In its most recent explanatioa-of its own utility 
guidelines, the PTO acknowledged as much (66 F.R. at 1095): 

[T]he utility of a claimed DNA does not necessarily depend on the 
function of the encoded gene product. A claimed DNA may have specific and 
substantial utility because, e.g., it hybridizes near a disease-associated gene or it 
has gene-regulating activity. 

By implicitly requiring knowledge of biological function for any claimed nucleic acid, 
the Examiner has, contrary to law, elevated what is at most an evidentiary factor into an absolute 
requirement of utility. Rather than looking to the biological role or function of the claimed 
invention, the Examiner should have looked first to the benefits it is alleged to provide. 

B. Membership in a class of useful products can be proof of utility 

Despite the uncontradicted evidence that the claimed polynucleotide encodes a 
polypeptide expressed by humans, the Examiner refused to impute the utility of the members of 
the family of expressed polypeptides to NHRP. 

In order to demonstrate utility by membership in a class, the law requires only that the 
class not contain a substantial number of useless members. So long as the class does not contain 
a substantial number of useless members, there is sufficient likelihood that the claimed invention 
will have utility, and a rejection under 35 U.S.C. § 101 is improper. That is true regardless of 
how the claimed invention ultimately is used and whether or not the members of the class 
possess one utility or many. See Brenner v. Manson, 383 U.S. 519, 532 (1966); Application of 
Kirk, 376 F.2d 936, 943 (CCPA 1967). 

Membership in a "general" class is insufficient to demonstrate utility only if the class 
contains a sufficient number of useless members such that a person of ordinary skill in the art 
could not impute utility by a substantial likelihood. There would be, in that case, a substantial 
likelihood that the claimed invention is one of the useless members of the class. In the few cases 
in which class membership did not prove utility by substantial likelihood, the classes did in fact 
include predominately useless members. E.g., Brenner (man-made steroids); Kirk (same); Natta 
(man-made polyethylene polymers). 

The Examiner addresses NHRP as if the general class in which it is included is not the 
family of expressed polypeptides, but rather all polynucleotides or all polypeptides, including the 
vast majority of useless theoretical molecules not occurring in nature, and thus not pre-selected 
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by nature to be useful. While these "general classes" may contain a substantial number of 
useless members, the family of expressed polypeptides does not. The family of expressed 
polypeptides is sufficiently specific to rule out any reasonable possibility that NHRP would not 
also be useful like the other members of the family. 

Because the Examiner has not presented any evidence that the family of expressed 
polypeptides has any, let alone a substantial number, of useless members, the Examiner must 
conclude that there is a "substantial likelihood" that the NHRP encoded by the claimed 
polynucleotide is useful. It follows that the claimed polynucleotide also is useful. 

C. Because the uses of the claimed polynucleotide in toxicology testing, drug 
discovery, and disease diagnosis are practical uses beyond mere study of the invention 
itself, the claimed invention has substantial utility 

As used in toxicology testing, drug discovery, and disease diagnosis, the claimed 

invention has a beneficial use in research other than studying the claimed invention or its protein 
products. It is a tool, rather than an object, of research. The data generated in gene expression 
monitoring using the claimed invention as a tool is not used merely to study the claimed 
polynucleotide itself, but rather to study properties of tissues, cells, and potential drug candidates 
and toxins. Without the claimed invention, the information regarding the properties of tissues, 
cells, drug candidates and toxins is less complete. [First Bedilion Declaration at f 15.] 

The use of the claimed invention as a research tool in toxicology testing is specific and 
substantial. While it is true that all polypeptides and polynucleotides expressed in humans have 
utility in toxicology testing based on the property of being expressed at some time in 
development or in the cell life cycle, this basis for utility does not preclude that utility from being 
specific and substantial. A toxicology test using any particular expressed polypeptide or 
polynucleotide is dependent on the identity of that polypeptide or polynucleotide, not on its 
biological function or its disease association. The results obtained from using any particular 
human-expressed polypeptide or polynucleotide in toxicology testing is specific to both the 
compound being tested and the polypeptide polynucleotide used in the test. No two human- 
expressed polypeptides or polynucleotides are interchangeable for toxicology testing because the 
effects on the expression of any two such polypeptides or polynucleotides will differ depending 
on the identity of the compound tested and the identities of the two polypeptides or 
polynucleotides. It is not necessary to know the biological functions and disease associations of 
the polypeptides or polynucleotides in order to carry out such toxicology tests. Therefore, at the 
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veryleast, the claimed polynucleotide is a specific control for toxicology tests in developing 
drugs targeted to other polypeptides or polynucleotides, and are clearly-useful as such. 

As an example, any histone gene or protein expressed in humans can be used in a specific 
and substantial toxicology test in drug development. A histone gene or protein may not be 
suitable as a target for drug development because disruption of such a gene may kill a patient. 
However, a human-expressed histone gene or protein is surely an excellent subject for toxicology 
studies when developing drugs targeted to other genes or proteins. A drug candidate which alters 
expression of a histone gene or protein is toxic because disruption of such a pervasively- 
expressed gene or protein would have undesirable side effects in a patient. Therefore, when 
testing the toxicology of a drug candidate targeted to another gene or protein, measuring the 
expression of a histone gene or protein is a good measure of the toxicity of that candidate, 
particularly in in vitro cellular assays at an early stage of drug development. The utility of any 
particular human-expressed histone gene or protein in toxicology testing is specific and 
substantial because a toxicology test using that histone gene or protein cannot be replaced by a 
toxicology test using a different gene, including any other histone gene or protein. This specific 
and substantial utility requires no knowledge of the biological function or disease association of 
the histone gene or protein . 

The expression of the SEQ ID NO:74 polynucleotide in human tissues would lead a 
skilled artisan to believe that this polynucleotide has some physiological implications, even if 
these implications have not been precisely identified. During toxicology testing, a change in 
expression of a human-expressed polynucleotide indicates potential toxicity of a drug candidate, 
even if the physiological implications of that polynucleotide or of the polypeptide encoded by 
that polynucleotide are unknown. Such a toxicology test allows one to choose a lead drug 
candidate which has minimal effects on the expression of proteins other than the protein to which 
the candidate is targeted. Such a lead drug candidate would be less likely to have unintended 
side effects than a drug candidate having greater effects on the expression of genes/proteins other 
than the intended drug target. Thus, the benefit of such a toxicology test is an increased chance 
of finding a safe and effective drug, and a corresponding reduction in the expense and time of 
bringing a drug to market. 

The claimed invention has numerous additional uses as a research tool, each of which 
alone is a "substantial utility." These include diagnostic assays (e.g., pages 55-59) and 
chromosomal mapping (e.g., pages 59-60). 
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D. The Patent Examiner failed to demonstrate that a person of ordinary skill in 
the art would reasonably doubt the utility of the claimed invention 

The Examiner bases the utility rejection on two issues, that the utilities of the claimed 
polynucleotide in toxicology testing are "not specific to the claimed polynucleotides" and that 
"the results of gene expression monitoring assays would be meaningless without significant 
further research." (Final Office Action, page 5.) Appellants demonstrate below that the claimed 
uses meet the requirement that the claimed invention yield a "specific benefit" and why these 
uses constitute more than "further research" into the claimed invention itself. 

1. Biological function, differential expression, or disease association is 
irrelevant to utility 

The Examiner states that "[a]fter further research, a specific and substantial credible 
utility might be found for the claimed isolated polynucleotides" (Final Office Action, page 3.) 
The Examiner alleges that such a finding of utility would require demonstrated biological 
function, disease association, or differential expression of the claimed polynucleotide. (Final 
Office Action, e.g., pages 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 17, and 21.) The Examiner however 
continues to ignores other utilities discussed in the Specification and/or well known in the art, 
such as toxicology testing, alleging that "the results of gene expression monitoring assays would 
be meaningless without significant further research." (Final Office Action, page 5.) 

Appellants have demonstrated a utility for the claimed SEQ ID NO:74 polynucleotide 
and the encoded SEQ ID NO:37 polypeptide irrespective of whether or not a person would wish 
to perform additional experimentation on biological function, disease association, or differential 
expression as another utility. The fact that additional experimentation could be performed to 
determine the biological function, disease association, or differential expression of the claimed 
SEQ ID NO:74 polynucleotide and the encoded SEQ ID NO:37 polypeptide does not preclude, 
and is in fact irrelevant to, the actual utility of the invention. That utility exists today regardless 
of the biological function, disease association, or differential expression of the claimed SEQ ID 
NO:74 polynucleotide and the encoded SEQ ID NO:37 polypeptide. (See, e.g., Rockett 
Declaration, f 18 and Iyer Declaration, 19.) 

Monitoring the expression of the claimed polynucleotide or the polypeptide encoded by 
the claimed polynucleotide gives important information on the potential toxicity of a drug 
candidate that is specifically targeted to any other polypeptide, regardless of the biological 
function, disease association, or differential expression of the claimed polynucleotide or the 
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polypeptide encoded by the claimed polynucleotide. The claimed polynucleotide or the 
polypeptide encoded by the claimed polynucleotide is useful for measuring the toxicity of drug 
candidates specifically targeted to other polynucleotides or polypeptides regardless of any 
possible utility for measuring the properties of the claimed polynucleotide or the polypeptide 
encoded by the claimed polynucleotide. 

2. Use of the claimed polynucleotide in toxicology testing 

The Office Action does not find the Bedilion Declaration persuasive, alleging that 
"any new polynucleotide can be used in a microarray, and thus this asserted utility is not 
specific" and that "the specification does not disclose that NHRP is expressed in at altered levels 
or forms in tissues exhibiting a pathological state." (Final Office Action, page 4.) 

The Examiner's arguments amount to nothing more than the Examiner's disagreement 
with the Bedilion Declaration and the Appellants' assertions about the knowledge of a person of 
ordinary skill in the art, and is tantamount to the substitution of the Examiner's own judgment 
for that of the Appellants' expert. The Examiner must accept the Appellants' assertions to be 
true. The Examiner is, moreover, wrong on the facts because the Bedilion Declaration 
demonstrates how one of skill in the art, reading the specification at the time the parent Lai '870 
application was filed (June 6, 1997), would have understood that specification to disclose the use 
of the claimed polynucleotide in gene expression monitoring for toxicology testing, drug 
development, and the diagnosis of disease (See the First Bedilion Declaration at, e.g., ffl 10-16). 

For example, monitoring the expression of the SEQ ID NO:74 polynucleotide is a method 
of testing the toxicology of drug candidates during the drug development process. Dr. Bedilion 
in his Declaration states that "good drugs are not only potent, they are specific. This means that 
they have strong effects on a specific biological target and minimal effects on all other biological 
targets." (First Bedilion Declaration f 10.) Thus, if the expression of a particular polynucleotide 
is affected in any way by exposure to a test compound, and if that particular polynucleotide is not 
the specific target of the test compound (e.g., if the test compound is a drug candidate), then the 
change in expression is an indication that the test compound may have undesirable toxic side 
effects. It is important to note that such an indication of possible toxicity is specific not only for 
each compound tested, but also for each and every individual polynucleotide whose expression is 
being monitored. 

However, the Examiner continues to view the utility in toxicology testing of the claimed 
polynucleotide as requiring knowledge of either the biological function or disease association or 
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differential expression of the claimed polynucleotide. The Examiner views toxicology testing as 
a process to measure the toxicity of a drug candidate only when that drug candidate is 
specifically targeted to the claimed polynucleotide. The Examiner has refused to consider that 
the claimed polynucleotide is useful for measuring the toxicity of drug candidates which are 
targeted not to the claimed polynucleotide, but to other polynucleotides. This utility of the 
claimed polynucleotide does not require any knowledge of the biological function or disease 
association or differential expression of the SEQ ID NO:37 polypeptide or SEQ ID NO:74 
polynucleotide and is a specific, substantial and credible utility. (See, e.g., Rockett Declaration, 
fl 8 and Iyer Declaration, 19.) 

The Final Office Action emphasizes that "[s]ince any polynucleotide can be used in a 
microarray, such a use is not specific to the claimed polynucleotides" (Final Office Action, page 
5), however Appellants note that: 

To meet the utility requirement of sections 101 and 112 of the Patent Act, 
the patent applicant need only show that the claimed invention is "practically 
useful," Anderson v. Natta, 480 F.2d 1392, 1397, 178 USPQ 458 (CCPA 1973) 
and confers a "specific benefit" on the public. Brenner v. Manson 383 U S 519 
534-35, 148 USPQ 689 (1966). 

Practical real-world uses are not limited to uses that are unique to an invention. The law 
requires that the practical utility be "definite," not particular. Montedison, 664 F.2d at 375. 
Appellants are not aware of any court that has rejected an assertion of utility on the grounds that 
it is not "particular" or "unique" to the specific invention. 

3. Discussion of toxicology testing in the Specification 

The Examiner alleges that "the particulars of toxicology testing with the claimed 
polynucleotides are not disclosed in the instant specification." (Final Office Action, page 7.) 
Well-established utilities, such as toxicology testing by the use of cDNA microarrays, need not 
be explicitly disclosed in a patent application. Furthermore, the Examiner's position amounts to 
nothing more than the Examiner's disagreement with the First Bedilion Declaration (which 
purports therefore to substitute the Examiner's judgment for that of Appellants' expert) and 
Appellants' assertions about the knowledge of a person of ordinary skill. The Examiner must 
accept Appellants' assertions to be true. The Final Office Action fails to address the disclosure 
in the instant specification on gene and protein expression monitoring applications, as discussed 
below. 
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Support for the utility of the claimed polynucleotide in toxicology testing, as well as for 
utility in drug screening, may be found in the specification. For example, the parent Lai '870 
application discloses that the polynucleotide sequences disclosed therein, including the SEQ ID 
NO:74 polynucleotide, are useful as probes in microarrays. (Lai '870 application, page 58, line 
13 through page 60, line 4 and page 67, line 28 through page 68, line 21.) The Lai '870 
Specification teaches that microarrays can be used "to monitor the expression level of large 
numbers of genes simultaneously (to produce a transcript image)" for a number of purposes, 
including "in developing and in monitoring the activities of therapeutic agents" (Lai '870 
application at page 58, lines 14-18). 

4. Utility of all expressed polynucleotides in toxicology testing 

The Examiner argues that use of the "[s]ince any polynucleotide can be used in a 
microarray, such a use is not specific to the claimed polynucleotides." (Final Office Action, 
page 5.) The Examiner further alleged that "any orphan gene can be used in the microarrays 
described by Rockett et al. (Rockett Declaration, Exhibit C) and that therefore "[t]he asserted 
utility for the claimed polynucleotide is not specific to the claimed polynucleotide." (Final 
Office Action, page 6.) The Examiner doesn't point to any law, however, that says a utility that 
is shared by a large class is somehow not a utility. If all of the class of expressed 
polynucleotides can be so used, then they all have utility. The issue is, once again, whether the 
claimed polynucleotide and encoded polypeptide have any utility, not whether other compounds 
have a similar utility. Nothing in the law says that an invention must have a "unique" utility. 
Indeed, the whole notion of well-established utilities PRESUPPOSES that many different 
inventions can have the exact same utility (if the Examiner's argument were correct, there could 
never be a well-established utility, because you could always find a generic group with the same 
utility!). 

It is true that just about any expressed polynucleotide will have use as a toxicology 
control, but Appellants need not argue this for the purposes of this case. Appellants argue only 
that this particular claimed invention could be so used, and has provided e.g., the First Bedilion 
Declaration, the Rockett Declaration, and the Iyer Declaration to back this up. The point is not 
whether or not the claimed polynucleotide is, in any given toxicology test, differentially 
expressed. The point is that the invention provides a useful measuring stick regardless of 
whether there is or is not differential expression. That makes the invention useful today, in the 
real-world, for real purposes. 
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5. The Final Office Action on page 6 asserts that the-Appellants have made a 
misplaced analogy by comparing the claimed polynucleotide to a scale. The Examiner asserts 
that a microarray is analogous to a scale while the claimed polynucleotide is analogous to "the 
object being weighed on the scale" which does not necessarily have patentable utility. The 
Examiner further asserts that "[i]t is true that a scale has patentable utility as a research tool" and 
that "microarray technology has patentable utility," but that "the microarray is not being claimed, 
but rather a polynucleotide that can be used in microarrays." (Final Office Action, page 6.) 
With respect to the utility of the claimed polynucleotide in toxicology testing, the Examiner is 
wrong. The claimed polynucleotide may be used as a probe on a microarray. In toxicology 
testing as described above, the claimed polynucleotide is not the object of the research. The 
claimed polynucleotide is a research tool used to assess the toxicity of drug candidates which are 
specifically targeted to other polynucleotides. It is the other polynucleotides and the drug 
candidates which are the object of the research. 

The Examiner further discounts the teaching in the Brown patent, cited by Bedilion in his 
Declaration, stating that "[t]he Brown patent claims methods of forming microarrays." (Final 
Office Action, page 6.) The Examiner ignores the teaching in the Brown patent that: 

In one application, an array of cDNA clones representing genes is 
hybridized with total cDNA from an organism to monitor gene expression for 
research or diagnostic purposes. . . This two-color experiment can be used to 
monitor gene expression in different tissue types, disease states, response to 
drugs, or response to environmental factors. (Brown, column 15, lines 5-7 and 
13-16.) 



In addition to the genetic applications listed above, arrays of whole cells, 
peptides, enzymes, antibodies, antigens, receptors, ligands, phospholipids, 
polymers, drug cogener preparations or chemical substances can be fabricated by 
the means described in this invention for large scale screening assays in medical 
diagnostics, drug discovery, molecular biology, immunology and toxicology 
(Brown, column 15, lines 52-58.) 



in a 



6. In addition, the use of an expressed polynucleotide as a control 
toxicology test is a specific utility. It is irrelevant whether "[i]n this case, as indicated at the 
bottom of page 18 of the Brief [sic: Response], all nucleic acids and genes are in some 
combination useful in toxicology testing" (Final Office Action, page 7) or whether "the 
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technique describe by Rockett ... can be preformed with any polynucleotides." (Final Office 
Action, page 8.) The Examiner implies that a utility is not specific if the process carried out in 
applying that utility to an object can also be carried out on a different object. This is incorrect. 
The fact that one can apply a given process to a number of different objects does not mean that 
the process is not a specific utility when applied to a particular object. In the present case, a 
toxicology test can be carried out using any polynucleotide expressed in humans as a control, 
providing that the polynucleotide is not the target of the toxicology test. In carrying out such a 
test, a particular process can be applied using any expressed polynucleotide. However, each 
toxicology test using a given expressed polynucleotide as a control is a distinct and unique 
toxicology test because the results of the test are dependent on the identity of the expressed 
polynucleotide. A toxicology test using a given expressed polynucleotide is not interchangeable 
with a toxicology test using a different expressed polynucleotide, even if the particular process 
used in carrying out the toxicology tests are identical. The fact that the same series of steps can 
be used to carry out such toxicology tests does not prevent such tests from being a specific 
utility. 

7. The Examiner contends that "use of the claimed polynucleotide in an array 
for toxicology screening is only useful in the sense that the information that is gained from the 
array is dependent on the pattern derived from the array, and says nothing with regard to each 
individual member of the array." (Final Office Action, page 7.) Appellants reiterate that the 
each individual claimed polynucleotide has utility, because with the addition of each expressed 
polynucleotide to the pool of genes available for use in gene expression technology, the more 
useful the gene expression technology (e.g., microarrays) is for toxicology testing. Each new 
gene available adds value to the set. The Examiner again ignores the teaching in the First 
Bedilion Declaration, which is tantamount to substituting the Examiner's own opinion for that of 
Appellants' expert. Dr. Bedilion, in his First Declaration, states that the "specification of the Lai 
'870 application would have led a person skilled in the art on June 6, 1997 who was using gene 
expression monitoring in connection with working on developing new drugs for the treatment of 
immune responses and cancers to conclude that a cDNA microarray that contained the SEQ ID 
NO:74 polynucleotide would be a highly useful tool and to request specifically that any cDNA 
microarray that was being used for such purposes contain the SEQ ID NO:74 polynucleotide." 
(First Bedilion Declaration, f 15 ). For example, as explained by Dr. Bedilion, "[pjersons skilled 
in the art would [have appreciated on June 6, 1997] that cDNA microarrays that contained the 
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SEQ ID NO:74 polynucleotide would be a more useful tool than cDNA microarrays that did not 
contain the SEQ ID NO:74 polynucleotide in connection with conducting gene expression 
monitoring studies on proposed (or actual) drugs for treating immune responses and cancers for 
such purposes as evaluating their efficacy and toxicity." Id. 

Furthermore, the claimed polynucleotide could be used in techniques that measure gene 
expression in non-microarray formats, such as northern analysis. (First Bedilion Declaration, f 
16.) 

8. The Examiner criticizes Appellants' citation of the commercial success of 
Incyte's databases as evidence of the commercial value of the contained information on the 
claimed polynucleotide. The Examiner argues that "many products which lack patentable utility 
enjoy commercial success, are actually used, and are considered valuable" including "silly fads 
such as pet rocks, but also. . . serious scientific products like orphan receptors." (Final Office 
Action, page 7.) Appellants note that there are at least two U.S. Patents claiming orphan 
receptors (U.S. Patent Nos. 5,958,710 and 6,277,976). 

9. The Examiner questions the utility of the claimed polynucleotide in 
toxicology testing, stating that "[n]either the toxic substances nor the susceptible organ systems 
are identified." (Final Office Action, page 7.) Appellants note that monitoring the expression of 
the claimed polynucleotide is a method of testing the toxicology of drug candidates during the 
drug development process. If the expression of a particular polynucleotide is affected in any way 
by exposure to a test compound, and if that particular polynucleotide (or its encoded 
polypeptide) is not the specific target of the test compound (e.g., if the test compound is a drug 
candidate), then the change in expression is an indication that the test compound may have 
undesirable toxic side effects that may limit its usefulness as a specific drug. Toxicology testing 
using microarrays reduces time needed for drug development by weeding out compounds which 
are not specific to the drug target. Learning this from an array in a gene expression monitoring 
experiment early in the drug development process costs less than learning this, for example, 
during Phase III clinical trials. It is important to note that such an indication of possible toxicity 
is specific not only for each compound tested, but also for each and every individual 
polynucleotide whose expression is being monitored. 



10. Appellants' Invention Has Specific Utility 

Doc No.l 18717 37 



09/745,506 



Docket No.: PF-0300-3 CON 
The Examiner alleges that "[t]his asserted utility [in toxicology testing] is not specific to 
' IRe Maimed poiy^ucleotides, as any DNA can be placed into the microarray in order to carry out 
further research into the expression of said DNA." (Final Office Action, page 5.) 

Appellants' submission of additional Declarations and references overcomes this 
concern. Those Declarations and references demonstrate that, far from applying regardless of 
the specific properties of the claimed invention, the utility of Appellants' claimed polynucleotide 
as a gene-specific probe depends upon specific properties of the polynucleotide, that is, its 
nucleic acid sequence. 

"[E]ach probe on ... [a "high density spotted microarray[]"], with careful design and 
sufficient length, and with sufficiently stringent hybridization and wash conditions, binds 
specifically and with minimal cross-hybridization, to the probe's cognate transcript" (Rockett 
Declaration, f 10(i), emphasis added); "[e]ach gene included as a probe on a microarray provides 
a signal that is specific to the cognate transcript, at least to a first approximation." (Iyer 
Declaration, f 7, emphasis added.) 3 Accordingly, "each additional probe makes an additional 
transcript newly detectable by the microarray, increasing the detection range, and thus versatility, 
of this analytical device for gene expression profiling" (Rockett Declaration, f 10(H)); equally, 
"[e]ach new gene-specific probe added to a microarray thus increases the number of genes 
detectable by the device, increasing the resolving power of the device." (Iyer Declaration, f 7.) 
Although not required for present purposes, it would be appropriate to state on the record here 
that the specificity of nucleic acid hybridization was well-established far earlier than the 
development of high density spotted microarrays in 1995, and indeed is the well-established 
underpinning of many, perhaps most, molecular biological techniques developed over the past 
30-40 years. 

11. The Examiner's reliance on Brenner v. Manson is misplaced 

This is not a case in which biological function is necessary to provide a link between the 
claimed invention on one hand, and a compound of known utility on the other. Given that the 
claimed invention is disclosed in the Lai '870 application to be useful as a tool in a number of 
gene expression monitoring applications that were well-known at the time of the filing of the 
application in connection with the development of drugs and the monitoring of the activity of 
drugs, the precise biological function (or disease association or differential expression) of the 



3 See Iyer Declaration, footnote at f 7 for a slightly more "nuanced" view 
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claimed polynucleotide or the encoded polypeptide is superfluous information for the purposes 
of establishing utility. _ 

The uncontested fact that the claimed invention already has a disclosed use as a tool in 
then available technology (such as cDNA microarrays) distinguishes it from those few claimed 
inventions found not to have utility. In each of those cases, unlike this one, the person of 
ordinary skill in the art was left to guess whether the claimed invention could be used to produce 
an identifiable benefit. Thus the Examiner's unsupported statement that one of those cases, 
Brenner v. Manson, 383 U.S. 519, 148 USPQ 689 (1966), is somehow analogous to this case is 
plainly incorrect. (Final Office Action, page 3.) 

Brenner concerns a narrow exception to the general rule that inventions are useful. It 
holds that where the assertion of utility for the claimed invention is made by association with a 
group including useful members, the group may not include so many useless members that there 
would be less than a substantial likelihood that the claimed invention is in fact one of the useful 
members of the group. In Brenner, the claimed invention was a process for making a synthetic 
steroid. Some steroids are useful, but most are not. While the claimed process in Brenner 
produced a composition that bore homology to some useful steroids, antitumor agents, it also 
bore structural homology to a substantial number of steroids having no utility at all. There was 
no evidence that could show, by substantial likelihood, that the claimed invention would produce 
the benefits of the small subset of useful steroids. It was entirely possible, and indeed likely, that 
the claimed invention was just as useless as the majority of steroids. 

In Brenner, the steroid was not disclosed in the application for a patent to be useful in its 
then-present form. Here, in contrast, the claimed SEQ ID NO:74 polynucleotide is an expressed 
polynucleotide that was disclosed to be useful in the Lai '870 application for many known 
applications involving gene expression monitoring analysis. Its utility is not a matter of 
guesswork. It is not a random DNA or protein sequence that might or might not be useful as a 
scientific tool. Unlike the steroid in Brenner, the utility of the invention claimed here is not 
grounded upon being structurally analogous to a molecule which belongs to a class of molecules 
containing a significant number of useless compositions. 

And, the utilities disclosed in the application are for purposes other than just studying the 
claimed invention itself, Brenner, 383 U.S. at 535, i.e., for other (non self-referential) uses such 
as to ascertain the toxic potential of a drug candidate and to study the efficacy of a proposed 
drug. Indeed, in view of the First Bedilion Declaration (at, e.g., <I 15), the evidence shows that 
persons skilled in the art on June 6, 1997, who read the Lai '870 application, would have 
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believed the claimed polynucleotide to be so useful that they would request it to be included as a 
probe in cDNA microarrays for conducting gene expression analyses in-association with 
identifying drugs for treating immune responses and cancers. 

Accordingly, in this case, biological function (or disease association or differential 
expression) is in fact superfluous information for the purposes of demonstrating utility. Here, the 
claimed invention is more than "substantially likely" to be useful, in a way that is utterly 
independent of knowledge of precise biological function, as the First Bedilion Declaration, the 
Rockett Declaration, the Iyer Declaration, and other evidence presented by the Appellants 
demonstrate. Given that the claimed invention has disclosed and well-established utilities, the 
Appellants need not demonstrate utility by imputation, or by showing disease association or 
differential expression. 

In the end, the Examiner has failed to recognize that new technologies, such as those 
involving the use of cDNA microarrays to conduct gene expression analyses, have made useful 
biological molecules that might not otherwise have been useful in the past. See Brenner, 383 
U.S. at 536. Technology has now advanced well beyond the point that a person of ordinary skill 
in the art would have to guess whether a newly discovered expressed polynucleotide or protein 
could be usefully employed without further research. It has created a need for new tools, such as 
the claimed polynucleotide, that provide, and have been providing for some time now, 
unquestioned commercial and scientific benefits, and real-world benefits to the public by 
enabling faster, cheaper and safer drug discovery processes. The Examiner is obliged, by law, to 
recognize this reality. 

IV. By requiring the patent applicant to assert a particular or unique utility, the Patent 
Examination Utility Guidelines and Training Materials applied by the Patent 
Examiner misstate the law 

There is an additional, independent reason to overturn the rejections: to the extent the 
rejections are based on Revised Interim Utility Examination Guidelines (64 FR 71427, 
December 21, 1999), the final Utility Examination Guidelines (66 FR 1092, January 5, 2001) 
and/or the Revised Interim Utility Guidelines Training Materials (USPTO Website 
www.uspto.gov, March 1, 2000), the Guidelines and Training Materials are themselves 
inconsistent with the law. 

The Training Materials, which direct the Examiners regarding how to apply the Utility 
Guidelines, address the issue of specificity with reference to two kinds of asserted utilities: 
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"specific" utilities which meet the statutory requirements, and "general" utilities which do not. 
The Training Materials define a "specific utility" as follows: _ 

A [specific utility] is specific to the subject matter claimed. This contrasts to general 
utility that would be applicable to the broad class of invention. For example, a claim to a 
polynucleotide whose use is disclosed simply as "gene probe" or "chromosome marker" would 
not be considered to be specific in the absence of a disclosure of a specific DNA target. 
Similarly, a general statement of diagnostic utility, such as diagnosing an unspecified disease, 
would ordinarily be insufficient absent a disclosure of what condition can be diagnosed. 

The Training Materials distinguish between "specific" and "general" utilities by assessing 
whether the asserted utility is sufficiently "particular " i.e., unique (Training Materials at page 
52) as compared to the "broad class of invention." (In this regard, the Training Materials appear 
to parallel the view set forth in Stephen G. Kunin, Written Description Guidelines and Utility 
Guidelines, 82 J.P.T.O.S. 77, 97 (Feb. 2000) ("With regard to the issue of specific utility the 
question to ask is whether or not a utility set forth in the specification is particular to the claimed 
invention.")). 

Such "unique" or "particular" utilities never have been required by the law. To meet the 
utility requirement, the invention need only be "practically useful," Natta, 480 F.2d 1 at 1397, 
and confer a "specific benefit" on the public. Brenner, 383 U.S. at 534. Thus, incredible 
"throwaway" utilities, such as trying to "patent a transgenic mouse by saying it makes great 
snake food," do not meet this standard. Karen Hall, Genomic Warfare, The American Lawyer 
68 (June 2000) (quoting John Doll, Chief of the Biotech Section of USPTO). 

This does not preclude, however, a general utility, contrary to the statement in the 
Training Materials where "specific utility" is defined (page 5). Practical real-world uses are not 
limited to uses that are unique to an invention. The law requires that the practical utility be 
"definite," not particular. Montedison, 664 F.2d at 375. Appellants are not aware of any court 
that has rejected an assertion of utility on the grounds that it is not "particular" or "unique" to the 
specific invention. Where courts have found utility to be too "general," it has been in those cases 
in which the asserted utility in the patent disclosure was not a practical use that conferred a 
specific benefit. That is, a person of ordinary skill in the art would have been left to guess as to 
how to benefit at all from the invention. In Kirk, for example, the CCPA held the assertion that a 
man-made steroid had "useful biological activity" was insufficient where there was no 
information in the specification as to how that biological activity could be practically used. Kirk, 
376 F.2d at 941. 
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The fact that an invention can have a particular use does not provide a basis for requiring 
a particular use. SeeBrana, supra (disclosure describing a claimed antitumor compound as 
being homologous to an antitumor compound having activity against a "particular" type of 
cancer was determined to satisfy the specificity requirement). "Particularity" is not and never 
has been the sine qua non of utility; it is, at most, one of many factors to be considered. 

As described supra, broad classes of inventions can satisfy the utility requirement so long 
as a person of ordinary skill in the art would understand how to achieve a practical benefit from 
knowledge of the class. Only classes that encompass a significant portion of nonuseful members 
would fail to meet the utility requirement. Supra § III.B. (Montedison, 664 F.2d at 374-75). 

The Training Materials fail to distinguish between broad classes that convey information 
of practical utility and those that do not, lumping all of them into the latter, unpatentable 
category of "general" utilities. As a result, the Training Materials paint with too broad a brush. 
Rigorously applied, they would render unpatentable whole categories of inventions that 
heretofore have been considered to be patentable and that have indisputably benefited the public, 
including the claimed invention. See supra § III.B. Thus the Training Materials cannot be 
applied consistently with the law. 

Issue 2: Enablement Rejection of Claims 25-33, 39, 41, 43, 44, and 45 

The rejection set forth in the Final Office Action is based on the assertions discussed 
above, i.e., that the claimed invention lacks patentable utility. To the extent that the rejection 
under 35 U.S.C. § 112, first paragraph, is based on the improper allegation of lack of patentable 
utility under 35 U.S.C. § 101, it fails for the same reasons. 

Issue 3: Enablement Rejection of Claims 25, 28-30, 32, 33, 39, 41, and 43-45 with 
respect to fragments, variants, arrays, complementary sequences, and RNA equivalents 

The Examiner further contended that the claimed polynucleotides encoding variants of 
SEQ ID NO:37, polynucleotides encoding fragments of SEQ ED NO:37, polynucleotide variants 
of SEQ ID NO:74, fragments of SEQ ID NO:74, fragments of polynucleotide variants of SEQ 
NO:74, complementary polynucleotide sequences and RNA equivalents to the above, and arrays 
comprising the above are not enabled. The Examiner states that "[t]he specification does not 
enable any person skilled in the art to which it pertains or with which it is most nearly connected, 
to make/use the invention commensurate in scope with these claims." (Final Office Action, page 
22.) 
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The claimed polynucleotides are enabled, i.e., they are supported by the Specification and 
what is well known in the art. _ 

I. How to make 

SEQ ID NO:37 and SEQ ID NO:74 are specifically disclosed in the application (see, for 
example, pages 95-96 and pages 1 13-1 14 of the Sequence Listing). Variants of SEQ ID NO:37 
and SEQ ID NO:74 are disclosed, for example, on page 33, lines 1-18. Incyte clones in which 
the nucleic acids encoding the human NHRP-37 were first identified and libraries from which 
those clones were isolated are disclosed, for example, on page 32, lines 18-23. Chemical and 
structural features of NHRP-37 are disclosed, for example, on page 32, lines 24-30. 

The Examiner alleged that "even a single amino acid substitution or what appears to be a 
minor modification will often dramatically affect the biological activity of a protein," and "it 
could not be predicted that a variant polynucleotide, or polynucleotide encoding a variant protein 
would have equivalent functional characteristic of the polynucleotide which encodes SEQ ID 
NO:37." (Final Office Action, page 23.) However, Appellants submit that the polypeptide 
variant sequences and polynucleotide variant sequences are described by their being "naturally 
occurring" and by their percentage sequence identity with SEQ ID NO:37 and SEQ ID NO:74 
and not by biological activity. The choice of amino acids or nucleotides to alter is made by 
nature. "Naturally occurring" polypeptide variant sequences and polynucleotide variant 
sequences occur in nature; they are not created exclusively in a laboratory. The Specification 
teaches how to find polynucleotide variants (e.g., page 55, lines 19-23) which can then be 
expressed to make polypeptide variants and how to use BLAST to determine whether a given 
naturally occurring polynucleotide sequence falls within the "at least 95% identical to the 
polynucleotide sequence of SEQ ID NO:74" scope and whether a given naturally occurring 
amino acid sequence falls within the "at least 95% identical to the amino acid sequence of SEQ 
ID NO:37" scope (e.g., page 63, line 10 through page 64, line 5). In addition, determination of 
percentage identity is well known in the art. 

The making of the claimed polynucleotides and RNA equivalents by recombinant and 
chemical synthetic methods is disclosed in the Specification, at, e.g., page 33, line 29 through 
page 34, line 3, page 36, line 30 through page 37, line 2, and page 37, lines 16-26. The making 
of the claimed arrays is disclosed in the Specification at, e.g., page 58, line 14 through page 59, 
line 13, and page 67, line 22 through page 68, line 10. The making of the claimed 
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polynucleotides comprising complementary sequences is disclosed in the Specification at, e.g., 
~page48, Tines 26-29, and page 68 Jines 16-25. * _ 

Appellants submit that the specification fully enables the making of the claimed 
polynucleotides encoding immunogenic fragments of SEQ ID NO:37. The polypeptide sequence 
of SEQ ID NO:37 is provided in the Sequence Listing. Preparation of immunogenic fragments is 
described in the Specification, e.g., at page 47, lines 4-10 and page 69, line 22 through page 70, 
line 2. 

The ability of a given fragment to induce a specific immune response in animals or cells, 
to bind with specific antibodies, or to elicit production of antibodies that bind to the full-length 
NHRP-37 (see Specification at, e.g., page 12, lines 6-8, page 46, line 22 through page 48, line 
14, and page 69, line 21 through page 70, line 6) are tests for whether the fragment is 
"immunogenic." The tests of fragments by these methods do not require undue experimentation; 
the Specification provides a test for antibody binding e.g., at page 61, lines 13-16. The making 
of antibodies is disclosed in the Specification at, e.g., page 46, line 22 through page 48, line 14 
and page 69, line 21 through page 70, line 6. 

This satisfies the "how to make" requirement of 35 U.S.C. § 112, first paragraph. 

II. How to Use 

The claimed polynucleotide variants, fragments, RNA equivalents, and complementary 
sequences are products of expressed genes. The claimed arrays comprise products of expressed 
genes. Therefore, these polynucleotides and arrays are useful for the same purposes as the 
polynucleotides comprising the polynucleotide sequence of SEQ ID NO:74 and the 
polynucleotide encoding the polypeptide sequence of SEQ ID NO: 37. These utilities are 
described fully under the rejection under §101 (Issue 1, supra) of this Brief and in the First 
Bedilion Declaration, Rockett Declaration, Iyer Declaration, and Second Bedilion Declaration. 
In addition, the Specification discloses the use of complementary polynucleotides in antisense 
technology e.g., on page 11, line 25 through page 12, line 3, page 48, lines 16-23, page 49, lines 
7-18, page 58, lines 18-26, and page 68, lines 16-25. In addition the Specification discloses the 
use of arrays e.g., on page 58, line 8 through page 59, line 28 and page 67, line 21 through page 
68, line 14. This satisfies the "how to use" requirement of 35 U.S.C. § 1 12, first paragraph. 

The Examiner cited Burgess et al., Lazar et al., Mathews and Van Holde, Matthews, and 
Bork in support of the argument that the claimed variant polynucleotides and recited variant 
polypeptides may have different biological functions than SEQ ID NO:74 and SEQ ID NO:37. 
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Jiowever, these documents do not support the enablement rejection as the Specification, along 
with what is well known to one of skill in the art, enable the use of the-elaimed variant 
polynucleotides and the claimed polynucleotides encoding variant polypeptides in toxicology 
testing by virtue of their being expressed polynucleotides, or encoding expressed polypeptides, 
regardless of their biological function. The Examiner has confused use with biological function. 

The Examiner further contends that "the specification does not teach how to use a probe 
comprising a sense strand of SEQ ID NO:74 or a sense strand of a naturally occurring variant of 
SEQ ID NO:74." (Final Office Action, page 23.) One of skill would be able to use 
embodiments of Claim 32 in an array. Furthermore, the Specification teaches the use of these 
polynucleotides in PCR reactions in methods of measuring the expression of the claimed 
polynucleotides, e.g.,. 

Additional diagnostic uses for oligonucleotides designed from the 
sequences encoding NHRP may involve the use of PCR. Such oligomers may be 
chemically synthesized, generated enzymatically, or produced in vitro 
Oligomers will preferably consist of two nucleotide sequences, one with sense 
orientation (5'->3') and another with antisense (3'<-5'), employed under 
optimized conditions for identification of a specific gene or condition. The same 
two oligomers, nested sets of oligomers, or even a degenerate pool of oligomers 
may be employed under less stringent conditions for detection and/or quantitation 
of closely related DNA or RNA sequences. (Specification, page 57 lines 23-30 
emphasis added.) 

The Examiner further alleges that "Applicant has not taught how o [sic] use an antibody 
that binds to the polypeptide encoded by the claimed polynucleotide because there is not 
biological function, significance, or correlation to a disease state associated with the disclosed 
polynucleotide" and that "[o]ne of skill would not know how to use antibodies which bind to 
SEQ ID NO:37 or antibodies which bind to variant of SEQ ID NO:37 having at least 95% 
sequence similarity to SEQ ID NO:37 for the same reason." (Final Office Action, page 27.) 

Antibodies which bind to polypeptides encoded by the claimed polynucleotides can be 
used, e.g, in toxicology testing to measure the expression of polypeptides encoded by the 
claimed polynucleotides. For example, Rockett discusses antibody microarrays in his 
Declaration stating that: 

Although the protein expression profiles produced by 2D-PAGE analysis 
are analogous to the transcript expression profiles provided by nucleic acid 
microarrays, an even closer analogy is perhaps offered by antibody microarrays- 
as I note in my Drug Discovery Today commentary, such antibody microarrays ' 
date back to the work of Roger Ekins in the mid- to late- 1980s (Rockett 
Declaration, ^13.) 
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Rockett cites the following publications with respect to antibody microarrays. ( Ekins et 
al., J. Bioluminescence Chemiluminescence 5:59-78 (1989); Ekins et aL, Clin. Chem. 37: 1955- 
1965 (1991); and Ekins, U.S. Patent Nos. 5,432,099, 5,807,755, and 5,837,551.) (Rockett 
Declaration, «j[13 and Rockett Exhibits M to Q). 

Rockett further states with respect to antibody microarrays that 

... as with nucleic acid microarrays, the greater the number of proteins 
detectable, the greater the power of the technique; the absence or failure of a 
protein to change in expression levels does not diminish the usefulness of the 
method; and prior knowledge of the biological function of the protein is not 
required. As applied to protein expression profiling, these principles have been 
well understood since at least as early as the 1980s. (Rockett Declaration, <][14.) 



Issue 4: Written Description Rejection of Claims 25, 28, 29, 30, 32, 33, 39, 41, 44, and 
45 with respect to allegedly new matter 

The Examiner rejected Claims 25, 28, 29, 30, 32, 33, 39, 41, 44, and 45 under 35 U.S.C. 
§112, first paragraph, stating that the claims were not adequately described because they 
allegedly contain "new matter." 

I. With respect to "naturally occurring" and "at least 95% identical to" 

The Examiner alleged that a naturally occurring amino sequence at least 95% identical to 
the amino acid sequence of SEQ ID NO:37 and a polynucleotide comprising a naturally 
occurring polynucleotide sequence at least 95% identical to the polynucleotide of SEQ ID 
NO:74" were not supported in the original disclosure. The Examiner noted that "[t]he 
specification contemplates allelic sequences on page 10, lines 1-7, and NHRP variants having 
90% sequence identity [to] the NHRP sequence, however, this is not adequate basis for naturally 
occurring amino acid sequences having at least 90% identity to SEQ ID NO:37 or naturally 
occurring polynucleotide sequences having 90% sequence identity to SEQ ID NO:74, or a 
variant which is at least 95% identical to SEQ ID NO:37 encoded by SEQ ID NO:74, or a 
polynucleotide comprising an allelic sequence having at least 95% identity to SEQ ID NO:74.)" 
(Final Office Action, page 29.) Appellants note that the claims were amended to recite "at least 
95% identical" with the Response filed January 27, 2003; the claims no longer recite "at least 
90% identical." 

Naturally occurring polypeptide sequences are supported in the Specification, e.g., at 
page 9, lines 23-26: 
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NHRP, as used herein, refers to the amino acid sequences of substantially 
"~ Purified-NHRP obtained from any species, particularly mammalian, including 
bovine, ovine, porcine, murine, equine, and preferably human, from any source 
whether natural, synthetic, semi-synthetic, or recombinant. 

Polypeptides comprising a sequence at least 95% identical to the amino acid sequence of 
SEQ ID NO:37 are supported in the Specification, e.g., at page 33, lines 3-5: 

A most preferred NHRP variant is one having at least 95% amino acid 
sequence identity to an NHRP disclosed herein (SEQ ID NOs:l-37). 

Case law provides that to fulfill the written description requirement of 35 U.S.C. §112, 
first paragraph, ". . . the applicant must also convey with reasonable clarity to those skilled in 
the art that, as of the filing date sought, he or she was in possession of the invention. The 
invention is, for purposes of the 'written description' inquiry, whatever is now claimed." Vas- 
Cath, Inc. v. Mahurkar, 19 USPQ2d 1111, 1117 (Fed. Cir. 1991). Consideration of the 
originally filed application shows that Appellants were in possession of what is now claimed, 
i.e., "a naturally occurring polynucleotide sequence at least 95% identical to the polynucleotide 
sequence of SEQ ID NO: 74." 

In this regard, see the following portions of the Specification as well as those cited above: 

It will be appreciated by those skilled in the art that as a result of the 
degeneracy of the genetic code, a multitude of nucleotide sequences encoding 
NHRP, some bearing minimal homology to the nucleotide sequences of any 
known and naturally occurring gene, may be produced. Thus, the invention 
contemplates each and every possible variation of nucleotide sequence that could 
be made by selecting combinations based on possible codon choices. These 
combinations are made in accordance with the standard triplet genetic code as 
applied to the nucleotide sequence of naturally occurring NHRP, and all such 
variations are to be considered as being specifically disclosed. (Specification 
page 33, lines 11-18.) 

Before the present proteins, nucleotide sequences, and methods are 
described, it is understood that this invention is not limited to the particular 
methodology, protocols, cell lines, vectors, and reagents described, as these may 
vary. It is also to be understood that the terminology used herein is for the 
purpose of describing particular embodiments only, and is not intended to limit 
the scope of the present invention which will be limited only by the appended 
claims. (Specification, page 9, lines 1-6.) 



Thus, while the originally filed application does not contain a verbatim recitation of the 
present "at least 95% identical to the polynucleotide sequence. . ." claim language, it is apparent 
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that the inventors contemplated naturally occurring polynucleotide sequences of NHRP 
molecules at least 95% identical to the polynucleotide sequence of SEQJD NO:74 by virtue of 
contemplating naturally occurring polypeptide sequences of NHRP molecules at least 95% 
identical to the polypeptide sequence of SEQ ID NO:37. 

Accordingly, the "at least 95% identical to the polynucleotide sequence . . ." language 
appearing in Claim 32 does not represent new matter. 



The Examiner further alleges that "[t]he specification or originally filed claims did not 

contemplate arrays comprising oligonucleotides complementary to polynucleotide having 95% 

identity to SEQ ID NO:74." (Final Office Action, page 29.) Appellants submit that such array 

are contemplated in the Specification. See the discussion supra, the Specification at, e.g., page 

12, lines 9-17, and page 68, lines 16-25, as well as below. 

In further embodiments, oligonucleotides derived from any of the 
polynucleotide sequences described herein may be used in microarrays. 

(Specification, page 58, lines 8-9, emphasis added.) 

In another embodiment of the invention, the polynucleotides encoding 
NHRP may be used for diagnostic purposes. The polynucleotides which may be 
used include oligonucleotide sequences, complementary RNA and DNA 
molecules, and PNAs. The polynucleotides may be used to detect and quantitate 
gene expression in biopsied tissues in which expression of NHRP may be 
correlated with disease. The diagnostic assay may be used to distinguish between 
absence, presence, and excess expression of NHRP, and to monitor regulation of 
NHRP levels during therapeutic intervention. 

In one aspect, hybridization with PCR probes which are capable of 
detecting polynucleotide sequences, including genomic sequences, encoding 
NHRP or closely related molecules, may be used to identify nucleic acid 
sequences which encode NHRP. The specificity of the probe, whether it is made 
from a highly specific region, e.g., 10 unique nucleotides in the 5' regulatory 
region, or a less specific region, e.g., especially in the 3' coding region, and the 
stringency of the hybridization or amplification (maximal, high, intermediate, or 
low) will determine whether the probe identifies only naturally occurring 
sequences encoding NHRP, alleles, or related sequences. 

Probes may also be used for the detection of related sequences, and 
should preferably contain at least 50% of the nucleotides from any of the 
NHRP encoding sequences. The hybridization probes of the subject invention 
may be DNA or RNA and derived from the nucleotide sequence of SEQ ID 
NOs:38-74 or from genomic sequence including promoter, enhancer elements, 
and introns of the naturally occurring NHRP. (Specification, page 55, lines 4-23, 
emphasis added.) 
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II. Claim 33~, with respect to "60 contiguous nucleotides of a polynucleotide of claim 32" 

The Examiner rejected Claim 33 on the basis of new matter, stating that "claim 33 
persists in incorporating the limitation of 60 consecutive nucleotides of claim 32, although page 
17, lines 14-17 [of the Office Action mailed September 25, 2002] state that the new limitation of 
'60 consecutive nucleotides' was not contemplated in the specification or claims as originally 
filed." (Final Office Action, page 29.) 

The polynucleotides of Claim 33 are supported in the Specification as filed. For 
example, at page 15, lines 9-10: '"Fragments' are those nucleic acid sequences which are 
greater than 60 nucleotides than [sic] in length." At page 7, lines 3-4: "In another aspect the 
invention provides compositions comprising isolated and purified polynucleotide sequences of 
SEQ ID NOs:38-74 or fragments thereof." At page 15, lines 12-13: "The term 'oligonucleotide' 
refers to a nucleic acid sequence of at least about 6 nucleotides to about 60 nucleotides." 

Issue 5: Written Description Rejection of Claims 25, 28, 29, 30, 32, 33, 39, 41, 44, and 
45 with respect to polynucleotides comprising a naturally-occurring polynucleotide 
sequence at least 95% identical to the polynucleotide sequence of SEQ ID NO:74 or the 
claimed polynucleotide encoding a polypeptide comprising a naturally-occurring amino 
acid sequence least 95% identical to the amino acid sequence of SEQ ID NO:37 

Claims 25, 28, 29, 30, 32, 33, 39, 41, 44, and 45 have been further rejected under the first 
paragraph of 35 U.S.C. §112 for alleged lack of an adequate written description. The Examiner 
alleged that "the written description is not commensurate in scope with the claims drawn to 
polynucleotides encoding naturally occurring amino acids [sic] sequences having 95% sequence 
identity to SEQ ID NO:37 or polynucleotides comprising a naturally occurring polynucleotide 
sequences [sic] at least 95% identical to SEQ ID NO:74." (Final Office Action, page 31.) The 
Examiner further alleged that "neither the common attributes of the genus nor specific examples 
of species representative of the genus have been described" and [w]ith the exception of SEQ ID 
NO:74, and the polynucleotides encoding SEQ ID NO:37, the skilled artisan cannot envision the 
detailed structure of the encompassed polynucleotides and therefore conception is not achieved 
until reduction to practice has occurred, regardless of the complexity or simplicity of the method 
of isolation." (Final Office Action, page 32.) 

The requirements necessary to fulfill the written description requirement of 35 U.S.C. 
§1 12, first paragraph, are well established by case law. 
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... the applicant must also convey with reasonable clarity to those skilled 
in-the art -that; as of -the filing date sought, he or sfie was in possession of the 
invention. The invention is, for purposes of the "written description" inquiry, 
whatever is now claimed. Vas-Cath, Inc. v. Mahurkar, 19 USPQ2d 1111 1117 
(Fed. Cir. 1991) 

Attention is also drawn to the Patent and Trademark Office's own "Guidelines for 
Examination of Patent Applications Under the 35 U.S.C. Sec. 1 12, para. 1", published January 5, 
2001, which provide that : 

An applicant may also show that an invention is complete by disclosure of 
sufficiently detailed, relevant identifying characteristics which provide evidence 
that applicant was in possession of the claimed invention, i.e., complete or partial 
structure, other physical and/or chemical properties, functional characteristics 
when coupled with a known or disclosed correlation between function and 
structure, or some combination of such characteristics. What is conventional or 
well known to one of ordinary skill in the art need not be disclosed in detail. 
If a skilled artisan would have understood the inventor to be in possession of 
the claimed invention at the time of filing, even if every nuance of the claims 
is not explicitly described in the specification, then the adequate description 
requirement is met. (citations omitted, emphasis added.) 

Thus, the written description standard is fulfilled by both what is specifically disclosed 
and what is conventional or well known to one skilled in the art. 

SEQ ID NO:37 and SEQ ID NO:74 are specifically disclosed in the application (see, for 
example, pages 95-96 and 113-114 of the Sequence Listing). Variants of SEQ ID NO:37 are 
described, for example, at page 17, lines 8-16. In particular, the preferred, more preferred, and 
most preferred SEQ ID NO:37 variants (80%, 90%, and 95% amino acid sequence identity to 
SEQ ID NO:37) are described, for example, at page 33, lines 1-5. Incyte clones in which the 
nucleic acids encoding the human NHRP-37 were first identified and libraries from which those 
clones were isolated are described, for example, at page 32, lines 18-23 of the Specification. 
Chemical and structural features of NHRP-37 are described, for example, on page 32, lines 24- 
30. Given SEQ ID NO:37, one of ordinary skill in the art would recognize naturally-occurring 
variants of SEQ ID NO:37 at least 95% identical to SEQ ID NO:37. Given SEQ ID NO:74, one 
of ordinary skill in the art would recognize naturally-occurring variants of SEQ ID NO:74 at 
least 95% identical to SEQ ID NO:74. The Specification describes (e.g., page 63, line 10 
through page 64, line 5) how to use BLAST to determine whether a given sequence falls within 
the "at least 95% identical" scope. Immunogenic fragments are described in the Specification, 
e.g., at page 12, lines 6-8. 
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There simply is no requirement that the claims recite particular variant and fragment 
polypeptide or polynucleotide sequences because the claims already provide sufficient structural 
definition of the claimed subject matter. That is, the polypeptide variants and fragments are 
defined in terms of SEQ ID NO:37 ("An isolated polynucleotide encoding a polypeptide selected 
from the group consisting of. . . b) a polypeptide comprising a naturally occurring amino acid 
sequence at least 95% identical to the amino acid sequence of SEQ ID NO:37, and c) an 
immunogenic fragment of a polypeptide having the amino acid sequence of SEQ ID NO:37." 
The polynucleotide variants and fragments are defined in terms of SEQ ID NO:74 ("An isolated 
polynucleotide selected from the group consisting of. . . : b) a polynucleotide comprising a 
naturally occurring polynucleotide sequence at least 95% identical to the polynucleotide 
sequence of SEQ ED NO:74;" "An isolated polynucleotide comprising at least 60 contiguous 
nucleotides of a polynucleotide of claim 32.") 

Because the recited polypeptide variants and fragments are defined in terms of SEQ ID 
NO:37, and the recited polynucleotide variants and fragments are defined in terms of SEQ ID 
NO:37 and SEQ ID NO:74, the precise chemical structure of every polypeptide variant and 
fragment and every polynucleotide variant and fragment within the scope of the claims can be 
discerned. The Examiner's position is nothing more than a misguided attempt to require 
Appellants to unduly limit the scope of their claimed invention. Appellants further submit that 
given the polypeptide sequence of SEQ ID NO:37 and the polynucleotide sequence of SEQ ID 
NO:74, it would be redundant to list specific fragments. The structures of SEQ ID NO:37 and 
SEQ ID NO:74 provide the blueprint for all fragments thereof. Listing all possible fragments of 
SEQ ID NO:37 and SEQ ED NO:74 is, thus, a superfluous exercise which would needlessly 
clutter the Specification. Accordingly, the Specification provides an adequate written 
description of the recited polypeptide and polynucleotide sequences. 

I. The present claims specifically define the claimed genus through the recitation of 
chemical structure 

Court cases in which "DNA claims" have been at issue commonly emphasize that the 
recitation of structural features or chemical or physical properties are important factors to 
consider in a written description analysis of such claims. For example, in Fiers v. Revel, 25 
USPQ2d 1601, 1606 (Fed. Cir. 1993), the court stated that: 
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If a conception of a DNA requires a precise definition, such as by 
' stfucturc.-formular chemical name or physical properties, as we have held, then a 

description also requires that degree of specificity. ~~ 

In a number of instances in which claims to DNA have been found invalid, the courts 

have noted that the claims attempted to define the claimed DNA in terms of functional 

characteristics without any reference to structural features. As set forth by the court in 

University of California v. Eli Lilly and Co., 43 USPQ2d 1398, 1406 (Fed. Cir. 1997): 

In claims to genetic material, however, a generic statement such as 
"vertebrate insulin cDNA" or "mammalian insulin cDNA," without more, is not 
an adequate written description of the genus because it does not distinguish the 
claimed genus from others, except by function. 

Thus, the mere recitation of functional characteristics of a DNA, without the definition of 
structural features, has been a common basis by which courts have found invalid claims to DNA. 
For example, in Lilly, 43 USPQ2d at 1407, the court found invalid for violation of the written 
description requirement the following claim of U.S. Patent No. 4,652,525: 

1. A recombinant plasmid replicable in procaryotic host containing within 
its nucleotide sequence a subsequence having the structure of the reverse 
transcript of an mRNA of a vertebrate, which mRNA encodes insulin. 



In Fiers, 25 USPQ2d at 1603, the parties were in an interference involving the following 

count: 

A DNA which consists essentially of a DNA which codes for a human 
fibroblast interferon-beta polypeptide. 

Party Revel in the Fiers case argued that its foreign priority application contained an 
adequate written description of the DNA of the count because that application mentioned a 
potential method for isolating the DNA. The Revel priority application, however, did not have a 
description of any particular DNA structure corresponding to the DNA of the count. The court 
therefore found that the Revel priority application lacked an adequate written description of the 
subject matter of the count. 

Thus, in Lilly and Fiers, nucleic acids were defined on the basis of functional 
characteristics and were found not to comply with the written description requirement of 35 
U.S.C. §1 12; i.e., "an mRNA of a vertebrate, which mRNA encodes insulin" in Lilly, and "DNA 
which codes for a human fibroblast interferon-beta polypeptide" in Fiers. In contrast to the 
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situation in Lilly and Fiers, the claims at issue in the present application define polynucleotides 
and polypeptides in terms of chemical structure, rather than on functional characteristics. For 
example, the "variant language" of independent Claims 25 and 32 recites chemical structure to 
define the claimed genus: 

25. An isolated polynucleotide encoding a polypeptide selected from the 
group consisting of:. . . 

b) a polypeptide comprising a naturally occurring amino acid 
sequence at least 95% identical to the amino acid sequence of SEQ ID NO:37. 

32. An isolated polynucleotide selected from the group consisting of. . . : 

b) a polynucleotide comprising a naturally occurring polynucleotide 
sequence at least 95% identical to the polynucleotide sequence of SEQ ED NO:74. 

From the above it should be apparent that the claims of the subject application are 
fundamentally different from those found invalid in Lilly and Fiers. The subject matter of the 
present claims is defined in terms of the chemical structure of SEQ ID NO:37 and SEQ ID 
NO:74. In the present case, there is no reliance merely on a description of functional 
characteristics of the polynucleotides and polypeptides recited by the claims. Such functional 
recitations that are included add to the structural characterization of the recited polypeptides and 
polynucleotides. The polynucleotides and polypeptides defined in the claims of the present 
application recite structural features, and cases such as Lilly and Fiers stress that the recitation of 
structure is an important factor to consider in a written description analysis of claims of this type. 
By failing to base its written description inquiry "on whatever is now claimed," the Final Office 
Action failed to provide an appropriate analysis of the present claims and how they differ from 
those found not to satisfy the written description requirement in Lilly and Fiers. 

II. The present claims do not define a genus which is "highly variant" 

Furthermore, the claims at issue do not describe a genus which could be characterized as 
"highly variant." (Final Office Action, page 35.) Available evidence illustrates that the claimed 
genus is of narrow scope. 

In support of this assertion, the Board's attention is directed to the enclosed reference by 
Brenner et al. ("Assessing sequence comparison methods with reliable structurally identified 
distant evolutionary relationships," Proc. Natl. Acad. Sci. USA (1998) 95:6073-6078) (Reference 
No. 17). Through exhaustive analysis of a data set of proteins with known structural and 
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functional relationships and with <90% overall sequence identity, Brenner et al. have determined 
that 30% identity is a reliable threshold for establishing evolutionary homology between two 
sequences aligned over at least 150 residues. (Brenner et al., pages 6073 and 6076.) 
Furthermore, local identity is particularly important in this case for assessing the significance of 
the alignments, as Brenner et al. further report that >40% identity over at least 70 residues is 
reliable in signifying homology between proteins. (Brenner et al., page 6076.) 

The present application is directed, inter alia, to regulatory proteins related to the amino 
acid sequence of SEQ ID NO:37. In accordance with Brenner et al, naturally occurring 
molecules may exist which could be characterized as regulatory proteins and which have as little 
as 40% identity over at least 70 residues to SEQ ID NO:37. The "variant language" of the 
present claims recites, for example, polynucleotides encoding "a polypeptide . . . comprising a 
naturally occurring amino acid sequence at least 95% identical to the amino acid sequence of 
SEQ ID NO:37" (note that SEQ ID NO:37 has 350 amino acid residues). This variation is far 
less than that of all potential regulatory proteins related to SEQ ID NO:37, i.e., those regulatory 
proteins having as little as 40% identity over at least 70 residues to SEQ ID NO:37. 

III. The state of the art at the time of the present invention is further advanced than at 
the time of the Lilly and Fiers applications 

In the Lilly case, claims of U.S. Patent No. 4,652,525 were found invalid for failing to 
comply with the written description requirement of 35 U.S.C. §1 12. The '525 patent claimed the 
benefit of priority of two applications, Application Serial No. 801,343 filed May 27, 1977, and 
Application Serial No. 805,023 filed June 9, 1977. In the Fiers case, party Revel claimed the 
benefit of priority of an Israeli application filed on November 21 , 1979. Thus, the written 
description inquiry in those case was based on the state of the art at essentially at the "dark ages" 
of recombinant DNA technology. 

The present application has a priority date of June 6, 1997. Much has happened in the 
development of recombinant DNA technology in the 17 or more years from the time of filing of 
the applications involved in Lilly and Fiers and the present application. For example, the 
technique of polymerase chain reaction (PCR) was invented. Highly efficient cloning and DNA 
sequencing technology has been developed. Large databases of protein and nucleotide 
sequences have been compiled. Much of the raw material of the human and other genomes has 
been sequenced. With these remarkable advances one of skill in the art would recognize that, 
given the sequence information of SEQ ID NO:37 and SEQ ID NO:74, and the additional 
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extensive detail provided by the subject application, the present inventors were in possession of 
the claimed polynucleotides encoding polypeptide variants and polypeptide fragments, the 
claimed polynucleotide variants, and the claimed polynucleotide fragments at the time of filing 
of this application. 

IV. The Examiner questions the value of Appellants' statements in the Responses filed 
February 20, 2003 and July 23, 2003 that "one of skill in the art would know how to use the 
BLAST program to determine 95% identity." (Final Office Action, page 32 and 36.) Appellants 
note that the claimed polynucleotides are sufficiently described. One of skill in the art would be 
able to describe polynucleotides comprising naturally occurring sequences that fall within the 
claimed limitations of percentage identity with SEQ ID NO:74 and to describe polynucleotides 
encoding polypeptides comprising naturally occurring sequences that fall within the claimed 
limitations of percentage identity with SEQ ID NO:37. 

The Examiner further alleges that "the instant genuses are not limited by functional 
attributes." (Final Office Action, pages 33 and 37.) However, functional limitations are not 
necessary as the structural and source limitations are sufficient to describe the claimed 
polynucleotide variants and claimed polynucleotides encoding polypeptide variants, and, in any 
case, "function" is irrelevant to the use of the claimed polynucleotide variants and claimed 
polynucleotides encoding polypeptide variants in toxicology testing. 

The Examiner states that Appellants' arguments are "not persuasive in light of the written 
description requirements which requires, in the absence of a recitation of a number of 
representative specifies [sic] of the genus, a function correlation with the disclosed single 
member of the genus." (Final Office Action, page 36.) Appellants note that the function is a 
requirement for adequate written description. 

V. The Examiner contends that "reliance on %90 or %95 [sic] sequence identity does not 
guarantee that the variants will have the same functional attributes as SEQ ID NO:37." (Final 
Office Action, pages 33 and 37-38.) As the claimed variants are not described by their having 
the same "function" as SEQ ID NO:37 or SEQ ID NO:74, the Examiner's arguments are not 
relevant to the written description issue. 

Nevertheless, Appellants note that it is well known in the art that sequence similarity is 
predictive of similarity in functional activity. Hegyi and Gerstein (H. Hegyi and M. Gerstein, 
"Annotation Transfer for Genomics: Measuring Functional Divergence in Multi-Domain 
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Proteins," Genome Research (2001) 11: 1632-1640; Reference No. 18) conclude that "the 
probability that two single-domain proteins that have the same superfamily structure have the 
same function (whether enzymatic or not) is about 2/3." (Hegyi and Gerstein, Reference No. 18, 
page 1635.) Hegyi and Gerstein also concluded that, for multi-domain proteins with "almost 
complete coverage with exactly the same type and number of superfamilies, following each other 
in the same order" "[t]he probability that the functions are the same in this case was 91%." 
(Hegyi and Gerstein, Reference No. 18, page 1636.) Hegyi and Gerstein (Reference No. 18, 
page 1632) further note that 

Wilson et al. (2000) compared a large number of protein domains to one 
another in a pair-wise fashion with respect to similarities in sequence, structure, 
and function. Using a hybrid functional classification scheme merging the 
ENZYME and FlyBase systems (Gelbart et al. 1997; Bairoch 2000), they found 
that precise function is not conserved below 30-40% identity, although the broad 
functional class is usually preserved for sequence identities as low as 20-25%, 
given that the sequences have the same fold. Their survey also reinforced the ' 
previously established general exponential relationship between structural and 
sequence similarity (Chothia and Lesk 1986). 

The polypeptides encoded by the claimed polynucleotides share more than 95% sequence 
identity with the SEQ ID NO:37 polypeptide, well above the thresholds described in the Hegyi 
and Gerstein article (Reference No. 18) cited above. Therefore, there is a reasonable probability 
that the SEQ ID NO:37 polypeptide variants would have the same function as the SEQ ID NO:37 
polypeptide. 

VI. In the Response filed January 27, 2003, Appellants asserted that Brenner teaches that 
"30% identity is a reliable threshold for establishing evolutionary homology between two 
sequences aligned over at least 150 residues" and that ">40% identity over at least 70 residues is 
reliable in signifying homology between proteins." and therefore, that therefore the genus of 
polypeptides at least 95% identical to SEQ ID NO:37 would more likely than not function 
similarly to the polypeptide of SEQ ID NO:37. 

The Examiner, however, dismisses Appellants' arguments, alleging that "Brenner is 
predicting evolutionary relationships within a database of orthologs which are identified 
independently of sequence comparison" and that evolutionary relationships are not predictive of 
functional relationships. (Final Office Action, pages 33 and 37.) 

In the Brenner paper the SCOP database was used as a test set to test the reliability of 
sequence comparison methods. The SCOP database used in the Brenner paper is a database of 
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Jtfoteins with known structures. The relationships among the SCOP proteins are already known 
based on non-sequence comparison methods. The structures and functions of the SCOP proteins 
do not need to be ascertained from sequence comparison methods. The Brenner results allow 
one to generalize to the much more common situation of NOT KNOWING the structural and 
functional relationships between two polypeptide sequences and trying to use sequence 
comparison methods to predict those relationships. As the Examiner acknowledges, Brenner 
does not discuss predicting functional similarity, but rather evolutionary relationships. 
(However, the "function" of the claimed polynucleotides or of the polypeptides encoded by the 
claimed polynucleotides is immaterial to the written description, given the description in the 
Specification and what is known to one of skill in the art.) Use of this database of proteins with 
known structures allowed the authors to determine whether homologies predicted from the 
sequence comparison methods tested in the article were truly similar structurally. Brenner is not 
trying to predict relationships between proteins; Brenner is evaluating known methods of 
predicting protein relationships. One cannot test the ability of sequence comparison methods in 
predicting actual structural homology if one starts with protein sequences whose structures were 
not already known previously and independently of the sequence comparison. 

VII. The Examiner asserts that "the instant genus claims are not limited by structural features" 
and that "the instant claims do not recite structural features, they recite only sequence 
homology." The Examiner further asserts that "[t]his is not the same [as] a structural feature 
such as a catalytic site or a binding site." (Final Office Action, pages 32-33.) 

Appellants note that the sequence of a polypeptide is well known in the art to constitute 
"structure." The amino acid sequence of a polypeptide is known as the "primary structure" of a 
polypeptide. For example, Stryer teaches that "[p]rimary structure is simply the sequence of 
amino acids and the location of disulfide bridges, if there are any. The primary structure is thus a 
complete description of the covalent connections of a protein." (L. Stryer, Biochemistry, 2nd 
edition, W.H. Freeman and Company, New York NY, 1981, page 32; Reference No. 19.) Claim 
25 limits the structure of the polypeptides encoded by the claimed polynucleotides to those 
naturally occurring amino acid sequences at least 95% identical to the amino acid sequence of 
SEQ ID NO:37. 



The Examiner further alleges that there is no limitation in the specification as to 
conservation of glycosylation and phosphorylation sites between SEQ ID N0 37 and its variants 
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Appellants refer the Board to the claims. The Specification adequately describes what is 
claimed. _ 

The Examiner alleges that "[n]either the specification nor claims identify common 
attributes shared by members of the genus in terms of use or function." (Final Office Action, 
page 33.) Appellants note that the claimed polynucleotides share structural attributes (% identity 
to SEQ ID NO:37 or SEQ ID NO:74). The claimed polynucleotides are naturally-occurring and 
thus share a "common use" in toxicology testing (see e.g., supra, Issues 1 and 3.) 

VIII. Summary 

The Final Office Action failed to base its written description inquiry "on whatever is now 
claimed." Consequently, the Action did not provide an appropriate analysis of the present claims 
and how they differ from those found not to satisfy the written description requirement in cases 
such as Lilly and Fiers. In particular, the claims of the subject application are fundamentally 
different from those found invalid in Lilly and Fiers. The subject matter of the present claims is 
defined in terms of the chemical structure of SEQ ID NO:37 or SEQ ID NO:74. The courts have 
stressed that structural features are important factors to consider in a written description analysis 
of claims to nucleic acids and proteins. In addition, the genus of polynucleotides defined by the 
present claims is adequately described, as evidenced by Brenner et al. Furthermore, there have 
been remarkable advances in the state of the art since the Lilly and Fiers cases, and these 
advances were given no consideration whatsoever in the position set forth by the Final Office 
Action. 



Issue 6: Provisional Double Patenting Rejection of Claims 25, 28, 29, 30, 32, 33, 39, 
41, and 42 

Claims 25, 28, 29, 30, 32, 33, 39, 41, and 42 were provisionally rejected under the 
judicially created doctrine of obviousness-type double patenting over Claims 1, 5, 6, and 7 of 
U.S. Application No. 09/539,800. While not conceding the propriety of the Examiner's position, 
Appellants are willing to submit a Terminal Disclaimer with respect to U.S. Application No. 
09/539,800 in the interest of expediting prosecution of the subject application, upon indication 
that the application is otherwise allowable. Therefore, it is requested that the Board indicate that 
the subject application will be allowable upon submission of such a Terminal Disclaimer. 
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(9) CONCLUSION 

Appellants request that the rejections of the claims on appeal be-reversed for at least the 
above reasons. 

Appellants respectfully submit that rejections for lack of utility based, inter alia, on an 
allegation of "lack of specificity," as set forth in the Final Office Action and as justified in the 
Revised Interim and final Utility Guidelines and Training Materials, are not supported in the law. 
Neither are they scientifically correct, nor supported by any evidence or sound scientific 
reasoning. These rejections are alleged to be founded on facts in court cases such as Brenner and 
Kirk, yet those facts are clearly distinguishable from the facts of the instant application, and 
indeed most if not all nucleotide and protein sequence applications. Nevertheless, the PTO is 
attempting to mold the facts and holdings of these prior cases, "like a nose of wax," 4 to target 
rejections of claims to polypeptide and polynucleotide sequences, where biological activity 
information has not been proven by laboratory experimentation, and they have done so by 
ignoring perfectly acceptable utilities fully disclosed in the specifications as well as well- 
established utilities known to those of skill in the art. As is disclosed in the specification, and 
even more clearly, as one of ordinary skill in the art would understand, the claimed invention has 
well-established, specific, substantial and credible utilities. The rejections are, therefore, 
improper and should be reversed. 

Moreover, to the extent the above rejections were based on the Revised Interim and final 
Examination Guidelines and Training Materials, those portions of the Guidelines and Training 
Materials that form the basis for the rejections should be determined to be inconsistent with the 
law. 

Due to the urgency of this matter and its economic and public health implications, an 
expedited review of this appeal is earnestly solicited. 

If the USPTO determines that any additional fees are due, the Commissioner is hereby 
authorized to charge Deposit Account No. 09-0108. 
This brief is enclosed in triplicate. 



4 'The concept of patentable subject matter under §101 is not 'like a nose of wax which may be turned and twisted 
m any Arecuon * V White v. Dunbar, 119 U.S. 47, 51." (Parker v. Flook, 198 USPQ 193 (US SupO t!978)) 
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APPENDIX ■ CLAIMS ON APPEAL 

25. (Previously Presented) An isolated polynucleotide encoding a polypeptide selected 
from the group consisting of: 

a) a polypeptide comprising the amino acid sequence of SEQ ID NO:37, 

b) a polypeptide comprising a naturally occurring amino acid sequence at least 95% 
identical to the amino acid sequence of SEQ ID NO:37, and 

c) an immunogenic fragment of a polypeptide having the amino acid sequence of SEQ 
ID NO:37. 

26. (Previously Presented) An isolated polynucleotide encoding a polypeptide 
comprising the amino acid sequence of SEQ ID NO:37. 

27. (Previously Presented) An isolated polynucleotide of claim 26 comprising the 
polynucleotide sequence of SEQ ID NO:74. 

28. (Previously Presented) An isolated recombinant polynucleotide comprising a 
promoter sequence operably linked to a polynucleotide of claim 25. 

29. (Previously Presented) An isolated cell transformed with a recombinant 
polynucleotide of claim 28. 

30. (Previously Presented) A method of producing a polypeptide encoded by a 
polynucleotide of claim 25, the method comprising: 

a) culturing a cell under conditions suitable for expression of the polypeptide, wherein 

said cell is transformed with a recombinant polynucleotide, and said recombinant 
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polynucleotide comprises a promoter sequence operably linked to a polynucleotide of 
claim 25, and — 
b) recovering the polypeptide so expressed. 

31. (Previously Presented) A method of claim 30, wherein the polypeptide comprises 
the amino acid sequence of SEQ ID NO:37. 

32. (Previously Presented) An isolated polynucleotide selected from the group 
consisting of: 

a) a polynucleotide comprising the polynucleotide sequence of SEQ ID NO:74 , 

b) a polynucleotide comprising a naturally occurring polynucleotide sequence at least 
95% identical to the polynucleotide sequence of SEQ ID NO: 74, 

c) a polynucleotide completely complementary to the polynucleotide of a) over the 
entire length of the polynucleotide of a), and 

d) a polynucleotide completely complementary to the polynucleotide of b) over the 
entire length of the polynucleotide of b). 

33. (Previously Presented) An isolated polynucleotide comprising at least 60 contiguous 
nucleotides of a polynucleotide of claim 32. 

39. (Previously Presented) A microarray wherein at least one element of the microarray 
is a polynucleotide of claim 43. 
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41. (Previously Presented) An array comprising different nucleotide molecules affixed 
in distinct physical locations on a solid substrate, wherein at least one of said nucleotide 
molecules comprises a first oligonucleotide or polynucleotide sequence completely 
complementary to 20 contiguous nucleotides of a target polynucleotide, and wherein said target 
polynucleotide is a polynucleotide of claim 32. 

43. (Previously Presented) An isolated polynucleotide comprising 20 contiguous 
nucleotides of a polynucleotide of claim 32. 

44. (Previously Presented) An isolated polynucleotide of claim 25 encoding a 
polypeptide comprising an amino acid sequence at least 95% identical to the amino acid 
sequence of SEQ ID NO:37 encoded by an allele of SEQ ID NO:74. 1 

45. (Previously Presented) An isolated polynucleotide of claim 32 selected from the 
group consisting of: 

a) a polynucleotide comprising a sequence of an allele of SEQ ID NO:74 at least 
95% identical to the polynucleotide sequence of SEQ ID NO:74, 

b) a polynucleotide completely complementary to the polynucleotide of a) over the 
entire length of the polynucleotide of a). 
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differentially expressed genes in healthy and diseased subjects 

Cross Reference to Related Applications: 
5 This application is a continuation-in-part application of U.S. Serial No. 

08/195,485 filed February 14, 1994, the contents of which are incorporated herein by 
reference. 

Field of the Invention 

10 The present invention relates to the use of immobilized 

oligonucleotide/polynucleotide or polynucleotide sequences for the identification, 
sequencing and characterization of genes which are implicated in disease, infection, 
or development and the use of such identified genes and the proteins encoded thereby 
in diagnosis, prognosis, therapy and drug discovery. 

15 

Background of the Invention 

Identification, sequencing and characterization of genes, especially 
human genes, is a major goal of modern scientific research. By identifying genes, 
determining their sequences and characterizing their biological function, it is possible 

20 to employ recobinant DNA technology to produce large quantities of valuable "gene 
products", e.g., proteins and peptides. Additionally, knowledge of gene sequences 
can provide a key to diagnosis, prognosis and treatment of a variety of disease states 
in plants and animals which are characterized by inappropriate expression and/or 
repression of selected gene(s) or by the influence of external factors, e.g., carcinogens 

25 or teratogens, on gene function. The term disease-associated genes(s) is used herein 
in its broadest sence to mean not only genes associated with classical inherited 
diseases, but also those associated with genetic predisposition to disease as well as 
infectious or pathogenic states resulting from gene expression by infectious agents or 
the effect on host cell gene expression by the presence of such a pathogen or its 

30 products Locating disease-associated genes will permit the development of 
diagnostic and prognostic reagents and methods, as well as possible therapeutic 
regimens, and the discovery of new drugs for treating or preventing the occurrence of 
such diseases. 

Methods have been described for the identification of certain novel 
35 gene sequences, referred to as Expressed Sequence Tags (EST) [see, e.g., Adams et 
al, Science . 252:1651-1656 (1991); and International Patent Application No. 
WO93/00353, published January 7, 1993]. Conventially, an EST is a specific cDNA 
polynucleotide sequence, or tag, about 150 to 400 nucleotides in length, derived from 
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~' a messenger RNA morecule by reverse transcription, which is a marker for, and 
component of, a human gene actually transcribed in vivo. However, as used herein an 
EST also refers to a genomic DNA fragment derived from an organism, such as a 
microorganism,the DNA of which lacks intron regions. 
5 A variety of techniques have been described for identifying particular 

gene sequences on the basis of their gene products. For example, several techniques 
are described in the an [see, e.g., International Patent Application No. W09 1/07087, 
published May 30, 1991]. Additionally, known methods exist for the amplification of 
desired sequences [see, e.g., International Patent Application No. W091/17271, 

10 published November 14, 1991, among others]. 

However, at present, there exist no established methods for filling the 
need in the art for methods and reagents which employ fragments of differentially 
expressed genes of known, unknown (or previously unrecognized ) function or 
consequence to provide diagnostic and therapeutic methods and reagents for diagnosis 

15 and treatment of disease or infection, which conditions are characterized by such 
genes and gene products. It should be appreciated that it is the expression differences 
that are diagnostic of the altered state (e.g., predisease, disease, pathogenic, 
progression or infectious). Such genes associated with the altered state are likely to 
be the targets of drug discovery, whether the genes are the cause or the effect of the 

20 condition, identification of such genes provides insight into which gene expression 
needs to be re-altered in order to reestablished the healthy state. 

Summary of the Invention 

In one aspect, the invention provides methods for identifying gene(s) 

25 which are differentially expressed, for example, in a normal healthy organism and an 
organism having a disease. The method involves producing and comparing 
hybridization patterns formed between samples of expressed mRNA or cDNA 
polynucleotide sequences obtained from either analogous cells, tissues or organs of a 
healthy organism and a diseased organism and a defined set of 

30 oligonucleotide/polynucleotide/polynucleotide sequence probes from either an 
healthy organism or a diseased organism immobilized on a support. Those defined 
oligonucleotide/polynucleotide sequences are representative of the total expressed 
genetic component of the cells, tissues, organs or organism as defined the collection 
of partial cDNA sequences (ESTs). The differences between the hybridization 

35 patterns permit identification of those particular EST or gene-specific 
oligonucleotide/polynucleotide sequences associated with differential expression, and 
the identification of the EST permits identification of the clone from which it was 
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~ " derived and using ordinary skiU further cloning'and, if desired^sequencing of the full- 
length cDNA and genomic counterpart, i.e., gene, from which it was obtained. 

In another aspect, the invention provides methods substantially similar 
to those described above, but which permit identification of those gene(s) of a 
5 pathogen which are expressed in any biological sample of an infected organism based 
on comparative hybridization of RNA/cDNA samples derived from a healthy versus 
infected organism, hybridized to an oligonucleotide/polynucleotide set representative 
of the gene coding complement of the pathogen of interest. 

In another aspect, the invention provides methods substantially similar 
10 to those described above, but which permit identification of those ESTs-specific 
oligonucleotide/polynucleotide sequences of host gene(s) which represent genes being 
differentially expressed/ altered in expression by the disease state, or infection and are 
expressed in any biological sample of an infected organism based on comparative 
hybridization of RNA/cDNA samples derived from a healthy vereus infected 
15 organism of interest 

In a further aspect, the methods described above and in detail below, 
also provide methods for diagnosis of diseases or infections characterized by 
differentially expressed genes, the expression of which has been altered as a result of 
infection by the pathogen or disease causing agent in question. All identified 
20 differences provide the basis for diagnostic testing be it the altered expression of 
endogenous genes or the patterned expression of the genes of the infecting organism. 
Such patterns of altered expression are defined by comparing RNA/cDNA from the 
two states hybridized against a panel of oligonucleotide/polynucleotides representing 
the expressed gene component of a cell, tissue, organ or organism as defined by its 
25 collection of ESTs. 

Yet a further aspect of this invention provides a composition suitable 
for use in hybridization, which comprises a solid surface on which is immobilized at 
pre-defined regions thereon a plurality of defined oligonucleotide/polynucleotide 
sequences for hybridization, each sequence comprising a fragment of an EST isolated 

30 from a cDNA or DNA library prepared from at least one selected tissue or cell 
sample of a healthy (i.e., pre-disease state) animal, at least one analogous sample of 
an animal having a disease, at least one analogous sample of an animal infected with a 
pathogen or the pathogen itself, or any combination or multiple combinations thereof. 

An additional aspect of the invention provides an isolated gene 

35 sequence which is differentially expressed in a normal healthy animal and an animal 
having a disease, and is identified by the methods above. Similarly, an isolated 
pathogen gene sequence which is expressed in tissue or cell samples of an infected 
animal can be identified by the methods above. 
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Yet another aspect of the invention is that it provides not only a means 
for a static diagnostic but also provides a means for a carrying out the procedure over 
time to measure disease progression as well as monitoring the efficacy of disease 
treatment regimes including an toxicological effects thereof. 
5 Another aspect of the invention is an isolated protein produced by 

expression of the gene sequences identified above. Such proteins are useful in 
therapeutic compositions or diagnostic compositions, or as targets for drug 
development. 

Other aspects and advantages of the present invention are described 
10 further in the following detailed description of the preferred embodiments thereof. 

Detailed Pescription gf the Invention 

The present invention meets the unfulfilled needs in the art by 
providing methods for the identification and use of gene fragments and genes, even 

15 those of unknown full length sequence and unknown function, which are 
differentially expressed in a healthy animal and in an animal having a specific disease 
or infection by use of ESTs derived from DNA libraries of healthy and/or 
diseased/infected animals. Employing the methods of this invention permits the 
resulting identification and isolation of such genes by using their corresponding ESTs 

20 and thereby also permits the production of protein products encoded by such genes. 
The genes themselves and/or protein products, if desired, may be employed in the 
diagnosis or therapy of the disease or infection with which the genes are associated 
and in the development of new drugs therefor. 

It has been appreciated that one or more differentially identified EST 

25 or gene-specific oligonucleotide/polynucleotides define a pattern of differentially 
expressed genes diagnostic of a predisease, disease or infective state. A knowledge of 
the specific biological function of the EST is not required only that the ESTs 
identifies a gene or genes whose altered expression is associated reproducibly with 
the predisease, disease or infectious state. The differences permit the identification of 

30 gene products altered in their expression by the disease and represent those products 
most likely to be targets of therapeutic intervention. Similarly, the product may be of 
the infecting organism itself and also be an effective target of intervention. 

/. Definitions. 

35 Several words and phrases used throughout this specification are 

defined as follows: 

As used herein, the term "gene" refers to the genomic nucleotide 
sequence from which a cDNA sequence is derived, which cDNA produces an EST, as 

4 
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" described below; The term gene classically refers to the genomic sequence, which, 
upon processing, can produce different cDNAs, e.g., by splicing events. However, 
for ease of reading, any full-length counterpart cDNA sequence which gives rise to an 
EST will also be referred to by shorthand herein as a 'gene'. 
5 The term "organism" includes without limitation, microbes, plants and 

animals. 

The term "animal" is used in its broadest sense to include all members 
of the animal kingdom, including humans. It should be understood, however, that 
according to this invention the same species of animal which provides the biological 
10 sample also is the source of the defined immobilized oHgonucleotide/polynucleotides 
as defined below. 

The term "pathogen" is defined herein as any molecule or organism 
which is capable of infecting an animal or plant and replicating its nucleic acid 
sequences in the cells or tissues of that animal or plant . Such a pathogen is generally 

15 associated with a disease condition in the infected animal or plant. Such pathogens 
may include viruses, which replicate intra- or extra-cellularly, or other organisms, 
such as bacteria, fungi or parasites, which generally infect tissues or the blood. 
Certain pathogens or microorganisms are known to exist in sequential and 
distinguishable stages of development, e.g., latent stages, infective stages, and stages 

20 which cause symptomatic diseases. In these different stages, the pathogens arc 
anticipated to express differentially certain genes and/or turn on or off host cell gene 
expression. 

As used herein, the term "disease" or "disease state" refers to any 
condition which deviates from a normal or standardized healthy state in an organism 

25 of the same species in terms of differential expression of the organism's genes. In 
other words, a disease state can be any illness or disorder be it of genetic or 
environmental origin , for example, an inherited disorder such as certain breast 
cancers, or a disorder which is characterized by expression of gene(s) normally in an 
inactive, 'turned off state in a healthy animal, or a disorder which is characterized by 

30 under-expression or no expression of gene(s) which is normally activated or 'turned 
on* in a normal healthy animal. Such differential expression of genes may also be 
detected in a condition caused by infection, inflammation, or allergy, a condition 
caused by development or aging of the animal, a condition caused by administration 
of a drug or exposure of the animal to another agent, e.g., nutrition, which affects 

35 gene expression. Essentially, the methods described herein can be adapted to detect 
differential gene expression resulting from any cause, by manipulation of the defined 
oligonucleotide/polynucleotides and the samples tested as described below. The 
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- - concept of disease oniisease state also includes its temporal aspects in terms of 
progression and treatment. 

The phrase "differentially expressed" refers to those situations in 
which a gene transcript is found in differing numbers of copies, or in activated vs 
5 inactivated states, in different cell types or tissue types of an organism, having a 
selected disease as contrasted to the levels of the gene transcript found in the same 
cells or tissues of a healthy organism. Genes may be differentially expressed in 
differing states of activation in microorganisms or pathogens in different stages of 
development For example, multiple copies of gene transcripts may be found in an 

10 organism having a selected disease, while only one, or significantly fewer copies, of 
the same gene transcript are found in a healthy organism, or vice-versa. 

As used herein, the term "solid support" refers to any known substrate 
which is useful for the immobilization of large numbers of 
oligonucleotide/polynucleotide sequences by any available method to enable 

15 detectable hybridization of the immobilized oligonucleotide/polynucleotide sequences 
with other polynucleotide sequences in a sample. Among a number of available solid 
supports, one desirable example is the supports described in International Patent 
Application No. WO91/07087, published May 30, 199LAlso useful are suports such 
as but not limited to nitrocellulose, mylein, glass, silica ans Pall Biodyne C® It is 

20 also anticipated that improvements yet to be made to conventional solid supports may 
also be employed in this invention. 

The term "surface" means any generally two-dimensional structure on 
a solid support to which the desired oligonucleotide/polynucleotide sequence is 
attached or immobilized- A surface may have steps, ridges, kinks, terraces and the 

25 like. 

As used herein, the term "predefined region" refers to a localized area 
on a surface of a solid support on which is immobilized one or multiple copies of a 
particular oligonucleotide/polynucleotide sequence and which enables the 
identification of the oligonucleotide/polynucleotide at the position, if hybridization of 
30 that oligonucleotide/polynucleotide to a sample polynucleotide occurs. 

By "immobilized" refers to the attachment of the 
oligonucleotide/polynucleotide to the solid support. Means of immobilization are 
known and conventional to those of skill in the art, and may depend on the type of 
support being used. 

35 By "EST" or "Expressed Sequence Tag" is meant a partial DNA or 

cDNA sequence of about 150 to 500, more preferably about 300, sequential 
nucleotides of a longer sequence obtained from a genomic or cDNA library prepared 
from a selected cell, cell type, tissue or tissue type, organ or organism which longer 
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sequence corresponds to an mRNA of a gene" found in that library. An EST is 
generally DNA. One or more libraries made from a single tissue type typically 
provide at least about 3000 different (i.e., unique) ESTs and potentially the full 
complement of all possible ESTs representing all cDNAs e.g., 50,000-100,000 in an 
5 animal such as a human. Further background and information on the construction of 
ESTs is described in M. D. Adams et al, Science, 252:1651-1656 (1991); and 
International Application Number PCT/US92/05222 (January 7, 1993). 

As used herein, the term "defined oligonucleotide/polynucleotide 
sequence" refers to a known nucleotide sequence fragment of a selected EST or gene. 

10 This term is used interchangeably with the term "fragments of EST". These 
sequential sequences are generally comprised of between about 15 to about 45 
nucleotides and more preferably between about 20 to about 25 nucleotides in length. 
Thus any single EST of 300 nucleotides in length may provide about 280 different 
defined oligonucleotide/polynucleotide sequences of 20 nucleotides in length (e.g., 

15 20-mers). The lengths of the defined oligonucleotide/polynucleotides may be readily 
increased or decreased as desired or needed, depending on the limitations of the solid 
support on which they may be immobilized or the requirements of the hybridization 
conditions to be employed.The length is generally guided by the principle that it 
should be of sufficient length to insure that it is one average only represented once in 

20 the population to be examined. Generally, these defined 

oligonucleotide/polynucleotides are RNA or DNA and are preferably derived from 
the anti-sense strand of the EST sequence or from a corresponding mRNA sequence 
to enable their hybridization with samples of RNA or DNA. Modified nucleotides 
may be incorporated to increase stability and hybridization properties. 

25 By the term "plurality of defined oligonucleotide/polynucleotide 

sequences" is meant the following. A surface of a solid support may immobilize a 
large number of "defined oligonucleotide/polynucleotides". For example, depending 
upon the nature of the surface, it can immobilize from about 300 to upwards of 
60,000 defined 20-mer oligonucleotide/polynucleotides. It is anticipated that future 

30 improvements to solid surfaces will permit considerably larger such pluralities to be 
immobilized on a single surface. A "plurality" of sequences refers to the use on any 
one solid support of multiple different defined oligonucleotide/polynucleotides from a 
single EST from a selected library, as well as multiple different defined 
oligonucleotide/polynucleotides from different ESTs from the same library or many 

35 libraries from the same or different tissues, and may also include multiple identical 
copies of defined oligonucleotide/polynucleotides. Ultimately a pluarality has at least 
one oligonucleotide/polynucleotide per expressed gene in the entire organism For 
example, from a library producing about 5,000-10,000 ESTs, a single support can 
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" include at least about 1-20 defined oligonucleotide/polynucleotides representing every 
EST in that library. The composition of defined oligonucleotide/polynucleotides 
which make up a surface according to this invention may be selected or designed as 
desired. 

5 The term "sample" is employed in the description of this invention in 

several important ways. As used herein, the term "sample" encompasses any cell or 
tissue from an organism. Any desired cell or tissue type in any desired state may be 
selected to form a sample. For example, the sample cell desired may be a human T 
cell; the desired cell type for use in this invention may be a quiescent T cell or an 

10 activated T cell. 

By the phrase "analogous sample" or "analogous cell or tissue" is 
meant that according to this invention when the ESTs which provide the defined 
oligonucleotide/polynucleotides are produced from a cDNA library prepared from a 
single tissue or cell type source sample, e.g., liver tissue of a human, then the samples 

15 used to hybridize to those immobilized defined oligonucleotide/polynucleotides are 
preferably provided by the same type of sample from either a healthy or diseased 
animal, i.e., liver tissue of a healthy human and liver tissue of a diseased or infected 
human or from a human suspected of having that disease or infection. Alternatively, 
if the surface contains defined oligonucleotide/polynucleotides from multiple cells or 

20 tissues, then the "samples" which are hybridized thereto can be but are not limited to 
samples obtained from analogous multiple tissues or cells. 

By the term "detectably hybridizing" means that the sample from the 
healthy organism or diseased or infected organism is contacted with the defined 
oligonucleotide/polynucleotides on the surface for sufficient time to permit the 

25 formation of patterns of hybridization on the surfaces caused by hybridization 
between certain polynucleotide sequences in the samples with the certain immobilized 
defined oligonucleotide/polynucleotides. These patterns are made detectable by the 
use of available conventional techniques, such as fluorescent labelling of the samples. 
Preferably hybridization takes place under stringent conditions, e.g., revealing 

30 homologies of about 95%. However, if desired, other less stringent conditions may 
be selected. Techniques and conditions for hybridization at selected stringencies are 
well known in the art [see, e.g., Sambrook et al. Molecular Cloning. A Laboratory 
Manygil,, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY (1989)]. 

35 //. Compositions of The Invention 

The present invention is based upon the use of ESTs from any desired 
cell or tissue in known technologies for oligonucleotide/polynucleotide hybridization. 
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" "a: ests 

An EST, as defined above, is for an animal, a sequence from a 
cDNA clone that corresponds to an mRNA. The EST sequences useful in the present 
invention are isolated preferably from cDNA libraries using a rapid screening and 
5 sequencing technique. Custom made cDNA libraries are made using known 
techniques. See, generally, Sambrook et al, cited above. Briefly, mRNA from a 
selected cell or tissue is reverse transcribed into complementary DNA (cDNA) using 
the reverse transcriptase enzyme and made double-stranded using RNase H coupled 
with DNA polymerase or reverse transcriptase. Restriction enzyme sites are added to 

10 the cDNA and it is cloned into a vector. The result is a cDNA library. Alternatively, 
commercially available cDNA libraries may be used. Libraries of cDNA can also be 
generated from recombinant expression of genomic DNA using known techniques, 
including polymerase chain reaction-derived techniques. 

ESTs (which can range from about 150 to about 500 nucleotides in 

15 length, preferably about 300 nucleotides) can be obtained through sequence analysis 
from either end of the cDNA insert Desirably, the DNA libraries used to obtain 
ESTs use directional cloning methods so that either the 5 1 end of the cDNA Qikely to 
contain coding sequence) or the 3' end (likely to be a non-coding sequence) can be 
selectively obtained. 

20 In general, the method for obtaining ESTs comprises applying 

conventional automated DNA sequencing technology to screen clones, 
advantageously randomly selected clones, from a cDNA library. The cDNA libraries 
from the desired tissue can be preprocessed, or edited, by conventional techniques to 
reduce repeated sequencing of high and intermediate abundance clones and to 

25 maximize the chances of finding rare messages from specific cell populations. 
Preferably, preprocessing includes the use of defined composition prescreening 
probes, e.g., cDNA corresponding to mitochondria, abundant sequences, ribosomes, 
actins, myelin basic polypeptides, or any other known high abundance peptide. These 
prescreening probes used for preprocessing are generally derived from known ESTs. 

30 Other useful preprocessing techniques include subtraction hybridization, which 
preferentially reduces the population of highly represented sequences in the library 
[e.g., see Fargnoli et al, Anal. Biochem. . 182:364 (1990)] and normalization, which 
results in all sequences being represented in approximately equal proportions in the 
library [Patanjali et al, Pre, Natl Agad. Scl USA, S&1943 (1991)]. Additional 

35 prescreening/differential screening approaches are known to those skilled in the art. 

ESTs can then be generated from partial DNA sequencing of the 
selected clones. The ESTs useful in the present invention are preferably generated 
using low redundancy of sequencing, typically a single sequencing reaction. While 
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single ^sequencing reactions may have an accuracy as low as 90%, this nevertheless 
provides sufficient fidelity for identification of the sequence and design of PCR 
primers. 

If desired, the location of an EST in a full length cDNA is determined 
5 by analyzing the EST for the presence of coding sequence. A conventional computer 
program is used to predict the extent and orientation of the coding region of a 
sequence (using all six reading frames). Based on this information, it is possible to 
infer the presence of start or stop codons within a sequence and whether the sequence 
is completely coding or completely non-coding or a combination of the two. If start 

10 or stop codons are present, then the EST can cover both part of the S'-untranslated or 
3'-untranslated part of the mRNA (respectively) as well as part of the coding 
sequence. If no coding sequence is present, it is likely that the EST is derived from 
the 3' untranslated sequence due to its longer length and the fact that most cDNA 
library construction methods are biased toward the 3' end of the mRNA. It should be 

15 understood that both coding and non-coding regions may provide ESTs equally useful 
in the described invention. 

A number of specific ESTs suitable for use in the present 
invention are described above Adams et al (supra) , which may be incorporated by 
reference herein, to describe non-essential examples of desirable ESTs. Other ESTs 

20 exist in the art which may also be useful in this invention, as will ESTs yet to be 
developed by these known techniques. 

B. Preparing the Solid Support of the Invention 

Oligonucleotide sequences which are fragments of defined 
sequence are derived from each EST by conventional means, e.g., conventional 

25 chemical synthesis or recombinant techniques. Each defined 

oligonucleotide/polynucleotide sequence as described above is a fragment, can be, but 
is not necessarily an anti-sense fragment, of an EST isolated from a DNA library 
prepared from a selected cell or tissue type from a selected animal. For use in the 
present invention, it is presently preferred that the defined 

30 oligonucleotide/polynucleotide sequences are 20-25mers. As described above, for 
each EST a number of such 20-25mers may be generated. The lengths may vary as 
described above as well as the composition. For example 
oligonucleotide/polynucleotides can be modified based on the Oligo 4.0 or simiolar 
programs to predict hybridization potential or to include modifieid nucleotides for the 

35 reasons given above. It is alos appreciated that large DNA segments may be 
employed including entire ESTs or even full length genes particular when inserted 
into cloning vectors. 



10 



WO 95/21944 



PCT/US95/01863 



" A plurality of these "defined oligonucleotide/polynucleotide 
sequences are then attached to a selected solid support conventionally used for the 
attachment of nucleotide sequences again by known means. In contrast to other 
technologies available in the art, this support is designed to contain defined, not 
5 random, oligonucleotide/polynucleotide sequences. The EST fragments, or defined 
oligonucleotide/polynucleotide sequences, immobilized on the solid support can 
include fragments of one or more ESTs from a library of at least one selected tissue 
or cell sample of a healthy animal, at least one analogous sample of the animal having 
a disease, at least one analogous sample of the animal infected with a pathogen, and 

10 any combination thereof. 

Numerous conventional methods are employed for attaching 
biological molecules such as oligonucleotide/polynucleotide sequences to surfaces of 
a variety of solid supports. See, e.g.. Affinity Technique s. Enzvme Purification: Part 
B. Methods in Enzvmolngv. Vol 34, ed. W.B. Jakoby, M. Wilcheck, Acad. Press, 

15 NY (1974); Immobilized Biochemical and Affinity Chromatograp hy. Advances in 
Experimental Medicine and Biology, vol. 42, ed. R. Dunlap, Plenum Press, NY 
(1974); U. S. Patent No. 4,762,881; U. S. Patent No. 4,542,102; European Patent 
Publication No. 391,608 (October 10, 1990); U. S. Patent No. 4,992,127 (Nov. 21, 
1989). 

20 One desirable method for attaching 

oligonucleotide/polynucleotide sequences derived from ESTs to a solid support is 
described in International Application No. PCT/US90/06607 (published May 30, 
1991). Briefly, this method involves forming predefined regions on a surface of a 
solidsupport, where the predefined regions are capable of immobilizing ESTs. The 

25 methods make use of binding substances attached to the surface which enable 
selective activation of the predefined regions. Upon activation, these binding 
substances become capable of binding and immobilizing 
oligonucleotide/polynucleotides based on EST or longer gene sequences. 

Any of the known solid substrates suitable for binding 

30 oligonucleotide/polynucleotides at pre-defined regions on the surface thereof for 
hybridization and methods for attaching the oligonucleotide/polynucleotides thereto 
may be employed by one of skill in the art according to this invention. Similarly, 
known conventional methods for making hybridization of the immobilized 
oligonucleotide/polynucleotides detectable, e.g., fluorescence, radioactivity, 

35 photoactivation, biotinylation, solid state circuitry, and the like may be used in this 
invention. 

Thus, by resorting to known techniques, the invention provides 
a composition suitable for use in hybridization which consists of a surface of a solid 

11 
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support on which is immobilized at pre-defined regions on said surface a plurality of 
defined oligonucleotide/polynucleotide sequences for hybridization. For example, 
one composition of this invention is a solid support on which are immobilized oligos 
of EST fragments from a library constructed from a single cell type, e.g., a human 
5 stem cell, or a single tissue, e.g., human liver, from a healthy human. Still another 
composition of this invention is another solid support on which axe immobilized 
oligos of EST fragments from a library constructed from a single cell type or a tissue 
from a human having a selected disease or predispositon to a selected disease, e.g., 
liver cancer. 

10 Another embodiment of the compositions of this invention 

include a single solid support having oligonucleotides of ESTs from both single cell 
or single tissue libraries from both a healthy and diseased human. Still other 
embodiments include a single support on which are immobilized oligos of EST 
fragments from more than one tissue or cell library from a healthy human or a single 

15 support on which are immobilized more than one tissue or cell library from both 
healthy and diseased animals or humans. A preferred composition of this invention is 
anticipated to be a single support containing oligos of ESTs for all known cells and 
tissues from a selected organism. 

20 ///. The Methods of the Invention 

A. Identification of Genes 

The present invention employs the compositions described 
above in methods for identifying genes which are differentially expressed in a normal 
healthy organism and an organism having a disease or infection. These methods may 

25 be employed to detect such genes, regardless of the state of knowledge about the 
function of the gene. The method of this invention by use of the compositions 
containing multiple defined EST fragments from a single gene as described above is 
able to detect levels of expression of genes or in other cases simply the expression or 
lack thereof, which differ between normal, healthy organisms and organisms having a 

30 selected disease, disorder or infection. 

One such method employs a first surface of a solid support on 
which is immobilized at pre-defined regions thereon a plurality of defined 
oligonucleotide/polynucleotide sequences, described above, of ESTor longer gene 
fragment isolated from a cDNA library prepared from at least one selected tissue or 

35 cell sample of a healthy animal (the "healthy test surface") and a second such surface 
on which is immobilized at pre-defined regions a plurality of defined 
oligonucleotide/polynucleotide sequences of ESTor longer gene fragment isolated 
from at least one analogous tissue of an animal having a selected disease (the "disease 

12 
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tesf surface")* * These test surfaces may be standardized for the selected animal or 
selected cell or tissue sample from that animal (i.e., they are prescreened for 
polymorphisms in the species population). 

Polynucleotide sequences are then isolated from mRNA and/or 
5 cDNA from a biological sample from a known healthy animal ("healthy control") and 
a second sample is similarly prepared from a sample from a known diseased animal 
("disease sample"). These two samples are desirably selected from the cell or tissue 
analogous to that which provided the immobilized oligonucleotide/polynucleotides. 

According to the method the healthy control sample is 

10 contacted with one set of the healthy test surface and the disease test surface 
described above for a time sufficient to permit detectable hybridization to occur 
between the sample and the immobilized defined oligonucleotide/polynucleotides on 
each surface. The results of this hybridization are a first hybridization pattern formed 
between the nucleotides of healthy control and the healthy test surface and a second 

15 hybridization pattern formed between the nucleotides of healthy control sample and 
the disease test surface. 

In a similar manner, the disease sample is detectably hybridized 
to another set of healthy test and disease test surfaces, forming a third hybridization 
pattern between the disease sample and healthy test surface and a fourth hybridization 

20 pattern between the disease sample and the disease test surface. 

Comparing the four hybridization patterns permits detection of 
those defined oligonucleotide/polynucleotides which are differentially expressed 
between the healthy control and the disease sample by the presence of differences in 
the hybridization patterns at pre-defined regions. The 

25 oligonucleotide/polynucleotides on each surface which correspond to the pattern 
differences may be readily identified with the corresponding ESTor longer gene 
fragment from which the oligonucleotide/polynucleotides are obtained. 

In another embodiment of the method of this invention, the 
same process is employed, with the exception that plurality of defined 

30 oligonucleotide/polynucleotide sequences forming the healthy test sample and the 
disease test sample surfaces are immobilized on a single solid support. For example, 
each fragment of an EST or longer gene fragment on the surface is isolated from at 
least two cDNA libraries prepared from a selected cell or tissue sample of a healthy 
animal and an analogous selected cell or tissue sample of an animal having a disease. 

35 According to this embodiment, the healthy control sample is 

detectably hybridized to a copy of this single solid surface, forming one hybridization 
pattern with oligonucleotide/polynucleotides associated with both the healthy and 
diseased animal. Similarly, the disease sample is detectably hybridized to a second 

13 
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" copy of this single solid surface, forming one hybridization pattern with 
oligonucleotide/polynucleotides associated with both the healthy and diseased animal. 

Comparing the two hybridization patterns permits detection of 
those defined oligonucleotide/polynucleotides which are differentially expressed 
5 between the healthy control and the disease sample by the presence of differences in 
the hybridization patterns at pre-defined regions. The 
oligonucleotide/polynucleotides on each surface which correspond to the pattern 
differences may be readily identified with the corresponding ESTor longer gene 
fragment from which the oligonucleotide/polynucleotides are obtained. 

10 The identification of one or more ESTs as the source of the 

defined oligonucleotide/polynucleotide which produced a "difference" in 
hybridization patterns according to these methods permits ready identification of the 
gene from which those ESTs were derived. Because oligonuleotides are of sufficient 
length that they will hybridize under stringent conditions only with a RNA/cDNA for 

15 that gene to which they correspond, the oligo can be used to identify the EST and in 
turn the clone from which it was derived and by subsequent cloning, obtain the 
sequence of the full-length cDNA and its genomic counterparts, i.e., the gene, from 
which it was obtained. 

In other words, the ESTs identified by the method of this 

20 invention can be employed to determine the complete sequence of the mRNA, in the 
form of transcribed cDNA, by using the EST as a probe to identify a cDNA clone 
corresponding to a full-length transcript, followed by sequencing of that clone. The 
EST or the full length cDNA clone can also be used as a probe to identify a genomic 
clone or clones that contain the complete gene including regulatory and promoter 

25 legions, exons, and introns. 

It should be appreciated that one does not have to be restricted 
in using ESTs from a particular tissue from which probe RNA or cDNA is obtained, 
rather any or all ESTs (known or unknown) may be placed on the support. 
Hybridization will be used a form diagnostic patterns or to identifiy which particular 

30 EST is detected. For example, all known ESTs from an organism are used to produce 
a "master" solid support to which control sample and disease samples are alternately 
hybridized* One then detects a pattern of hybridization associated with the particular 
disaease state which then forms the basis of a diagnostic test or the isolation of 
disease specific ESTs from which the intact gene may be cloned and sequenced 

35 leading uiltimately to a defined therapuetic target. 

Methods for obtaining complete gene sequences from ESTs are 
well-known to those of skill in the art. See, generally, Sambrook et al, cited above. 
Briefly, one suitable method involves purifying the DNA from the clone that was 

14 
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sequenced to gfv&'the EST iand labeling the isolated insert DNA. Suitable labeling 
systems are well known to those of skill in the art [see, eg. Basic Methods in 
Molecular Biology, L. G. Davis et al, ed., Elsevier Press, NY (1986)]. The labeled 
EST insert is then used as a probe to screen a lambda phage cDNA library or a 
5 plasmid cDNA library, identifying colonies containing clones related to the probe 
cDNA which can be purified by known methods. The ends of the newly purified 
clones are then sequenced to identify full length sequences and complete sequencing 
of full length clones is performed by enzymatic digestion or primer walking. A 
similar screening and clone selection approach can be applied to clones from a 
10 genomic DNA library. 

Additionally, an EST or gene identified by this method as 
associated with inherited disorders can be used to determine at what stage during 
embryonic development the selected gene from which it is derived is developed by 
screening embryonic DNA libraries from various stages of development, e.g. 2-cell, 
15 8-cell, etc., for the selected gene. As has been mentioned above, the invention may 
be applied in addtional temporal modes for monitoring the progression of a disease 
state, the efficacy of a particular treatment modality or the aging process of an 
individual. 

Thus, the methods of this invention permit the identification, 
20 isolation and sequencing of a gene which is differentially expressed in a selected 
disease/infection. As described in more detail below, the identified gene may then be 
employed to obtain any protein encoded thereby, or may be employed as a target for 
diagnostic methods or therapeutic approaches to the treatment of the disease, 
including, e.g., drug development 
25 The same methods as described above for the identification of 

genes, including genes of unknown function, which are differentially expressed in a 
disease state, may also be employed to identify other genes of interest. For example, 
another embodiment of this invention includes a method for identifying a gene of a 
pathogen which is expressed in a biological sample of an animal infected with that 
30 pathogen or the gene of the host which is altered in its expression as a result of the 
infection. 

One such method employs a healthy test surface as described 
above, employing defined oligonucleotide/polynucleotides from a sample of a 
healthy, uninfected animal. The second such surface has immobilized at pre-defined 
35 regions thereon a plurality of defined oligonucleotide/polynucleotide sequences of 
ESTs isolated from at least one analogous tissue or cell sample of an infected animal 
(the "infection test surface"). Polynucleotide sequences are isolated from a biological 
sample from a healthy animal ("healthy control") and a second sample is similarly 

15 
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" " prepared from afi animal infected with the sel&ted pathogen ("infection sample"). 
These two samples are desirably selected from the cell or tissue analogous to that 
which provided the immobilized oligonucleotide/polynucleotides. It would also be 
possible to provide samples from the nucleic acid of the pathogen itself. 
5 According to the method the healthy control sample is 

contacted with one set of the healthy test surface and the infection test surface 
described above for a time sufficient to permit detectable hybridization to occur 
between the sample and the immobilized defined oligonucleotide/polynucleotides on 
each surface. The results of this hybridization are a first hybridization pattern formed 

10 between the nucleotides of healthy control and the healthy test surface and a second 
hybridization pattern formed between the nucleotides of healthy control sample and 
the infection test surface. 

In a similar manner, the infection sample is detectably 
hybridized to another set of healthy test and infection test surfaces, forming a third 

15 hybridization pattern between the infection sample and healthy test surface and a 
fourth hybridization pattern between the infection sample and the infection test 
surface. 

Comparing the four hybridization patterns permits detection of 
those defined oligonucleotide/polynucleotides which are differentially expressed 

20 between the healthy animal and the animal infected with the pathogen by the presence 
of differences in the hybridization patterns at pre-defined regions. As mentioned 
differential expression is not required and simple qualitative analysis is possible by 
reference to gene expression which is simply present or absent. 

A second embodiment of this method parallels the second 

25 embodiment of the method as applied to disease above, i.e., the same process is 
employed, with the exception that plurality of defined oligonucleotide/polynucleotide 
sequences forming the healthy test sample surface and the infection test sample 
surface are immobilized on a single solid support. The resulting first hybridization 
pattern (healthy control sample with healthy/infection test sample) and second 

30 hybridization pattern (infection sample with healthy/infection test sample) permits 
detection of those defined oligonucleotide/polynucleotides which are differentially 
expressed between the healthy control and the infection sample by the presence of 
differences in the hybridization patterns at pre-defined regions. The 
oligonucleotide/polynucleotides on each surface which correspond to the pattern 

35 differences may be readily identified with the corresponding ESTs from which the 
oligonucleotide/polynucleotides are obtained. 

As described above for the methods for identifying differential 
gene expression between diseased and healthy animals, the 
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" oUgbnuclebtideTpolynucleotides on each surface which correspond to the pattern 
differences may be readily identified with the corresponding ESTs from which the 
oligonucleotide/polynucleotide sequences are obtained and the genes expressed by the 
pathogen identified for similar purposes. Other embodiments of these methods may 
5 be developed with resort to the teaching herein, by altering the samples which provide 
the defined oligonucleotide/polynucleotides. For example, an EST, identified with a 
differentially expressed gene by the method of this invention is also useful in 
detecting genes expressed in the various stages of an pathogen's development, 
particularly the infective stage and following the cours of drug treatment and 

10 emergence of resistant variants. For example, employing the techniques described 
above, the EST can be used for detecting a gene in various stages of the parasitic 
Plasmodium species life cycle, which include blood stages, liver stages, and 
gametocyte stages. 

B. Diagnostic Methods 

15 In addition to use of the methods and compositions of this 

invention for identifying differentially expressed genes, another embodiment of this 
invention provides diagnostic methods for diagnosing a selected disease state, or a 
selected state resulting from aging, exposure to drugs or infection in an animal. 
According to this aspect of the invention, a first surface, described as the healthy test 

20 surface above, and a second surface, described as the disease test surface or infection 
test surface, are prepared depending on the disease or infection to be diagnosed. The 
same processes of detectable hybridization to a first and second set of these surfaces 
with the healthy control sample and disease/infection sample are followed to provide 
the four above-described hybridization patterns, i.e., healthy control sample with 

25 healthy test surface; healthy control sample with disease/infection test surface; 
disease/infection sample with healthy test surface; and disease/infection sample with 
disease/infection test surface. 

The diagnosis of disease or infection is provided by comparing 
the four hybridization patterns. Substantial differences between the first and third 

30 hybridization patterns, respectively, and the second and fourth hybridization patterns, 
respectively, indicate the presence of the selected disease or infection in said animal. 
Substantial similarities in the first and third hybridization patterns and second and 
fourth hybridization patterns indicates the absence of disease or infection. 

A similar embodiment utilizes the single surface bearing both 

35 the healthy test surface defined oligonucleotide/polynucleotides and the 
disease/infection test surface defined oligonucleotide/polynucleotides as described 
above. Parallel process steps as described above for detection of genes differentially 
expressed in disease and infected states are followed, resulting in a first hybridization 
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~~ pattern (healthy control sample with single healthy and disease/infection test sample) 
and a second hybridization pattern (disease/infection sample with another copy of the 
single healthy and disease/infection test sample). 

Diagnosis is accomplished by comparing the two hybridization 
5 patterns, wherein substantial differences between the first and second hybridization 
patterns indicate the presence of the selected disease or infection in the animal being 
tested. Substantially similar first and second hybridization patterns indicate the 
absence of disease or infection. This like many of the foregoing embodiments may 
use known or unknown ESTs derived from many libraries. 

10 C. Other Methods of the Invention 

As is obvious to one of skill in the art upon reading this 
disclosure, the compositions and methods of this invention may also be used for other 
similar purposes. For example, the general methods and compositions may be 
adapted easily by manipulation of the samples selected to provide the standardized 

15 defined oligonucleotide/polynucleotides, and selection of the samples selected for 
hybridization thereto. One such modification is the use of this invention to identify 
cell markers of any type, e.g., markers of cancer cells, stem cell markers, and the like. 
Another modification involves the use of the method and compositions to generate 
hybridization patterns useful for forensic identification or an 'expression fingerprint 1 

20 of genes for identification of one member of a species from another. Similarly, the 
methods of this invention may be adapted for use in tissue matching for 
transplantation purposes as well as for molecular histology, i.e., to enable diagnosis of 
disease or disorders in pathology tissue samples such as biopsies. Still another use of 
this method is in monitoring the effects of development and aging upon the gene 

25 expression in a selected animal, by preparing surfaces bearing 
oligonucleotide/polynucleotides prepared from samples of standardized younger 
members of the species being tested. Additionally the patient can serve as an internal 
control by virtue of having the method applied to blood samples every 5-10 years 
during his lifetime. 

30 Still another intriguing use of this method is in the area of 

monitoring the effects of drugs on gene expression, both in laboratories and during 
clinical trials with animal, especially humans. Because the method can be readily 
adapted by altering the above parameters, it can essentially be employed to identify 
differentially expressed genes of any organism, at any stage of development, and 

35 under the influence of any factor which can affect gene expression. 
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TV. The Genes and Proteins Identified 

Application of the compositions and methods of this invention as 
above described also provide other compositions, such as any isolated gene sequence 
which is differentially expressed between a normal healthy animal and an animal 
5 having a disease or infection. Another embodiment of this invention is any isolated 
pathogen gene sequence which is expressed in tissue or cell samples of an infected 
animal. Similarly an embodiment of this invention is any gene sequence identified by 
the methods described herein. 

These gene sequences may be employed in conventional methods to 
10 produce isolated proteins encoded thereby. To produce a protein of this invention, 
the DNA sequences of a desired gene identified by the use of the methods of this 
invention or portions thereof are inserted into a suitable expression system. 
Desirably, a recombinant molecule or vector is constructed in which the 
polynucleotide sequence encoding the protein is operably linked to a heterologous 
15 expression control sequence permitting expression of the human protein. Numerous 
types of appropriate expression vectors and host cell systems are known in the art for 
mammalian (including human) expression, insect, e.g., baculovirus expression, yeast, 
fungal, and bacterial expression, by standard molecular biology techniques. 

The transfection of these vectors into appropriate host cells, whether 
20 mammalian, bacterial, fungal, or insect, or into appropriate viruses, can result in 
expression of the selected proteins. Suitable host cells or cell lines for transfection, 
and viruses, as well as methods for the construction and transfection of such host cells 
and viruses are well-known. Suitable methods for transfection, culture, amplification, 
screening, and product production and purification are also known in the art 
25 The genes and proteins identified by this invention can be employed, if 

desired in diagnostic compositions useful for the diagnosis of a disease or infection 
using conventional diagnostic assays. For example, a diagnostic reagent can be 
developed which detectably targets a gene sequence or protein of this invention in a 
biological sample of an animal. Such a reagent may be a complementary nucleotide 
30 sequence, an antibody (monoclonal, recombinant or polyclonal), or a chemically 
derived agonist or antagonist Alternatively, the proteins and polynucleotide 
sequences of this invention, fragments of same, or complementary sequences thereto, 
may themselves be useful as diagnostic reagents for diagnosing disease states with 
which the ESTs of the invention are associated. These reagents may optionally be 
35 labelled using diagnostic labels, such as radioactive labels, colorimetric enzyme label 
systems and the like conventionally used in diagnostic or therapeutic methods, e.g, 
Northern and Western blotting, antigen-antibody binding and the like. The selection 
of the appropriate assay format and label system is within the skill of the art and may 
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- — readily be chosen without requiring additional- explanation by resort to the wealth of 
art in the diagnostic area. 

Additionally, genes and proteins identified according to this invention 

i 

may be used therapeutically. For example, the EST-containing gene sequences may 
5 be useful in gene therapy, to provide a gene sequence which in a disease is not 
properly or sufficiently expressed. In such a method, a selected gene sequence of this 
invention is introduced into a suitable vector or other delivery system for delivery to a 
cell containing a defect in the selected gene. Suitable delivery systems are well 
known to those of skill in the art and enable the desired EST or gene to be 

10 incorporated into the target cell and to be translated by the cell. The EST or gene 
sequence may be introduced to mutate the existing gene by recombination or provide 
an active copy thereof in addition to the inactive gene to replace its function. 

Alternatively, a protein encoded by an EST or gene of the invention 
may be useful as a therapeutic reagent for delivery of a biologically active protein, 

15 particularly when the disease state is associated with a deficiency of this protein. 
Such a protein may be incorporated into an appropriate therapeutic formulation, alone 
or in combination with other active ingredients. Methods of formulating such 
therapeutic compositions, as well as suitable pharmaceutical carriers, and the like, are 
well known to those of skill in the art. Still an additional method of delivering the 

20 missing protein encoded by an EST, or the gene from which a selected EST was 
derived, involves expressing it directly in vivo. Systems for such in vivo expression 
are well known in the art 

Yet another use of the ESTs, genes identified according to the methods 
of this invention, or the proteins encoded thereby is a target for the screening and 

25 development of natural or synthetic chemical compounds which have utility as 
therapeutic drugs for the treatment of disease states associated with the identified 
genes and ESTs derived therefrom. As one example, a compound capable of binding 
to such a protein encoded by such a gene and either preventing or enhancing its 
biological activity may be a useful drug component for the treatment or prevention of 

30 such disease states. 

Conventional assays and techniques may be used for the screening and 
development of such drugs. As one example, a method for identifying compounds 
which specifically bind to or inhibit or activate proteins encoded by these gene 
sequences can include simply the steps of contacting a selected protein or gene 

35 product, with a test compound to permit binding of the test compound to the protein; 
and determining the amount of test compound, if any, which is bound to the protein. 
Such a method may involve the incubation of the test compound and the protein 
immobilized on a solid support. Still other conventional methods of drug screening 
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cannnvolve-employing a suitable computer pfogram to determine compounds having 
similar or complementary chemical structures to that of the "gene product or portions 
thereof and screening those compounds either for competitive binding to the protein 
to detect enhanced or decreased activity in the presence of the selected compound. 
5 Thus, through use of such methods, the present invention is anticipated 

to provide compounds capable of interacting with these genes, ESTs, or encoded 
proteins, or fragments thereof, and either enhancing or decreasing the biological 
activity, as desired. Such compounds are believed to be encompassed by this 
invention. 

10 Numerous modifications and variations of the present invention are 

included in the above-identified specification and are expected to be obvious to one of 
skill in the art. Such modifications and alterations to the compositions and processes 
of the present invention are believed to be encompassed in the scope of the claims 
appended hereto. 

15 
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' WHATISCLABVEDIS: 

1. A method for identifying genes which are differentially expressed in 
two different pre-determined states of an organism comprising: 
5 a. providing a first surface on which is immobilized at pre-defined 

regions on said surface a plurality of defined oligonucleotide/polynucleotide 
sequences, each sequence selected from the group consisting of a fragment of an EST, 
an entire EST a fragment of a gene or an entire gene, isolated from a DNA library 
prepared from at least one selected cell, tissue, organ or organism sample in a first 
10 state and present in excess relative to the polynucleotide to be hybridized; 

b. providing a second surface on which is immobilized at pre-defined 
regions on said surface a plurality of defined oligonucleotide/polynucleotide 
sequences, each sequence selected from the group consisting of a fragment of an EST, 
an entire EST a fragment of a gene or an entire gene, isolated from a DNA library 

15 prepared from at least one selected cell, tissue, organ or organism sample in a second 
state and present in excess relative to the polynucleotide to be hybridized; 

c. detectably hybridizing to a set of said first and second surfaces 
polynucleotide sequences isolated from a sample from a said organism in said first 
state, said sample selected from sources analogous to the sources of step (a), said 

20 hybridization sufficient to form a first and second hybridization pattern on each said 
first and second surface, 

& detectably hybridizing to a set of said first and second surfaces 
polynucleotide sequences isolated from a sample from said organism in said second 
state, said sample selected from sources analogous to the sources of step (c), said 

25 hybridization sufficient to form a third and fourth hybridization pattern on each said 
first and second surface, 

e. comparing at least two of the four hybridization patterns, 
wherein genes differentially expressed in said first and second states are identified by 
the presence of differences in the hybridization patterns at pre-defined regions; 

30 f. identifying the oligonucleotide/polynucleotides on each surface 

which correspond to said pattern differences and the corresponding ESTs or larger 
gene fragment from which the oligonucleotide/polynucleotides were obtained, 
whereby identification of the EST or larger gene fragment permits identification of 
the gene from which the ESTs or larger gene fragment were derived. 

35 
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~ ~ 2. " Therfiethod according to Claim 1 w&erein said first and second states are 
respectively healthy and disease; pathogen uninfected and pathogen infected; a first 
progression state and a second progression of a disease or infection; a first treatment 
state and a second treatment state of a disease or infection; or a first developmental 
5 and a second developmental state. 

3, The method according to Claim 1 wherein said organism is a plant or an 

animal. 

10 4. The method according to Claim 3 wherein said aniaml is a human. 

5. A method for identifying genes which are differentially expressed in a 
normal healthy animal and an animal having a disease comprising: 

a. providing a first surface on which is immobilized at pre- 
15 defined regions on said surface a plurality of defined oligonucleotide/polynucleotide 

sequences, each sequence each sequence selected from the group consisting of a 
fragment of an EST, an entire EST a fragment of a gene or an entire gene, isolated 
from a DNA library prepared from at least one selected cell, tissue, organ or organism 
sample in a healthy animal and present in excess relative to the polynucleotide to be 
20 hybridized; 

b. providing a second surface on which is immobilized at pre- 
defined regions of said surface a plurality of defined oligonucleotide/polynucleotide 
sequences, each sequence each sequence selected from the group consisting of a 
fragment of an EST, an entire EST a fragment of a gene or an entire gene, isolated 

25 from a DNA library prepared from at least one selected cell, tissue, organ or organism 
sample from an animal having said disease and present in excess relative to the 
polynucleotide to be hybridized; 

c. detectably hybridizing to a set of said first and second surfaces 
polynucleotide sequences isolated from a sample from a healthy animal, said sample 

30 selected from sources analogous to the sources of step (a), said hybridization 
sufficient to form a first and second hybridization pattern on each said first and 
second surface, said sample selected from a cell or tissue sample analogous to the 
sample of step (a), said hybridization sufficient to form a first and second 
hybridization pattern on each said first and second surface; 



23 



WO 95/21944 



PCT/US95/01863 



~ " " *d. " detectably hybridizing fo a set of said first and second surfaces 
polynucleotide sequences isolated from a sample from an animal having said disease, 
said sample selected from a cell or tissue sample analogous to the sample of step (c), 
said hybridization sufficient to form a third and fourth hybridization pattern on each 
5 said first and second surface, 

e. comparing at least two of the four hybridization patterns, 
wherein genes differentially expressed in said first and second states are identified by * 
the presence of differences in the hybridization patterns at pre-defined regions; 

f. identifying the oligonucleotide/polynucleotides on each surface 
10 which correspond to said pattern differences and the corresponding ESTs or larger 

gene fragment from which the oligonucleotide/polynucleotides were obtained, 
whereby identification of the EST or larger gene fragment permits identification of 
the gene from which the ESTs or larger gene fragment were derived. 

15 6. A method for identifying genes which are differentially expressed in a 

normal healthy animal and an animal having a disease comprising: 

a. providing a surface on which is immobilized at pre-defined 
regions on said surface a plurality of defined oligonucleotide/polynucleotide 
sequences, each sequence selected from the group consisting of a fragment of an EST, 

20 an entire EST a fragment of a gene or an entire gene isolated from a DNA library 
prepared from the group selected from at least one selected cell, tissue, organ or 
organism sample in of a healthy animal and an analogous selected sample of an 
animal having said disease and both present in excess relative to the polynucleotide to 
be hybridized; 

25 b. detectably hybridizing to a first copy of said surface 

polynucleotide sequences isolated from a healthy animal, said sample selected from a 
cell or tissue sample analogous to the sample of step (a), said hybridization sufficient 
to form a first hybridization pattern on said surface; 

c. detectably hybridizing to a second copy of said surface 
30 polynucleotide sequences isolated from an animal having said disease, said sample 

selected from a cell or tissue sample analogous to the sample of step (a), said 
hybridization sufficient to form a second hybridization pattern on said surface; 

d. comparing the two hybridization patterns, wherein genes 
differentially expressed in a disease state are identified by the presence of differences 

35 in the hybridization patterns at pre-defined regions; 
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" " " " identifying the oligonucleotide/polynucleotides on each surface 

which correspond to said pattern differences and the corresponding ESTs from which 
the oligonucleotide/polynucleotides are obtained, whereby identification of the EST 
permits identification of the gene from which the ESTs were derived. 

5 

7. A method for identifying a gene of a pathogen which is expressed in a 
biological sample of an animal infected with said pathogen comprising: 

a. providing a first surface on which is immobilized at pre- 
defined regions on said surface a plurality of defined oligonucleotide^olynucleotide 
10 sequences, each sequence selected from the group consisting of a fragment of an EST, 
an entire EST a fragment of a gene or an entire gene isolated from a DNA library 
prepared from at least one selected cell, tissue, organ or organism sample of a 
healthy, uninfected animal and present in excess relative to the polynucleotide to be 
hybridized; 

15 b - providing a second surface on which is immobilized at pre- 

defined regions of said surface a plurality of defined oligonucleotide/polynucleotide 
sequences, each sequence selected from the group consisting of a fragment of an EST, 
an entire EST a fragment of a gene or an entire gene isolated from at least one 
selected cell, tissue, organ or organism sample of an infected animal; 

20 c - detectably hybridizing to a set of said first and second surfaces 

polynucleotide sequences isolated from a sample from a healthy animal, said sample 
selected from a cell or tissue sample analogous to the sample of step (a), said 
hybridization sufficient to form first and second hybridization patterns on each said 
first and second surface, 

25 d - detectably hybridizing to a set of said first and second surfaces 

polynucleotide sequences isolated from a sample from an infected animal, said 
sample selected from a cell or tissue sample analogous to the sample of step (a), said 
hybridization sufficient to form third and fourth hybridization patterns on each said 
first and second surface, 

30 e * comparing the four hybridization patterns, wherein genes of 

said pathogen which are expressed in an infected animal are identified by the 
presence of differences in the hybridization patterns at pre-defined regions; 

f. identifying the oligonucleotide/polynucleotides on each surface 
which correspond to said pattern differences and the corresponding ESTs from which 

35 the oligonucleotide/polynucleotides are obtained, whereby identification of the EST 
permits identification of the gene from which the ESTs were derived. 
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- 8; A* method for identifying a gentf of a pathogen which is expressed in a 

biological sample of an animal infected with said pathogen comprising: 

a. providing a surface on which is immobilized at pre-defined 
regions on said surface a plurality of defined oligonucleotide/polynucleotide 

5 sequences, each sequence selected from the group consisting of a fragment of an EST, 
an entire EST a fragment of a gene or an entire gene isolated from a DNA library 
prepared from the group selected from at least one selected cell, tissue, organ or 
organism sample in of a healthy animal and an analogous selected sample of an 
animal having said disease and both present in excess relative to the polynucleotide to 
10 be hybridized 

b. detectably hybridizing to a first copy of said surface 
polynucleotide sequences isolated from a sample from a healthy animal, said sample 
selected from a cell or tissue sample analogous to the sample of step (a), said 
hybridization sufficient to form a first hybridization pattern on said surface; 

15 c. detectably hybridizing to a second copy of said surface 

polynucleotide sequences isolated from a sample from an infected animal, said 
sample selected from a cell or tissue sample analogous to the sample of step (a), said 
hybridization sufficient to form a second hybridization pattern on said surface; 

d. comparing the two hybridization patterns, wherein genes of 
20 said pathogen which are expressed in an infected animal are identified by the 

presence of differences in the hybridization patterns at pre-defined regions; 

e. identifying the oligonucleotide/polynucleotides on each surface 
which correspond to said pattern differences and the corresponding ESTs from which 
the oligonucleotide/polynucleotides are obtained, whereby identification of the EST 

25 permits identification of the gene from which the ESTs were derived. 

9. A composition suitable for use in hybridization comprising a solid 
surface on which is immobilized at pre-defined regions on said surface a plurality of 
defined oligonucleotide/polynucleotide sequences for hybridization, each sequence 

30 selected from the group consisting of a fragment of an EST, an entire EST a fragment 
of a gene or an entire gene isolated from a DNA library prepared from the group 
selected from at least one selected cell, tissue, organ or organism sample of a healthy 
animal, at least one analogous sample of said animal having a disease, at least one 
analogous sample of said animal infected with a microbial pathogen, and any 

35 combination thereof. 
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~ "10. ■""""Ail isolated gene sequence which is differentially expressed in a 
normal healthy animal and an animal having a disease, identified by the method of 
claim 1. 

5 11. An isolated pathogen gene sequence which is expressed in tissue or 

cell samples of an infected animal identified by the method of claim 7. 

12. A diagnostic composition useful for the diagnosis of a disease 
comprising a reagent capable of detectably targeting a gene sequence of claim 10 in a 

10 biological sample of an animal. 

13. A diagnostic composition useful for the diagnosis of infection by a 
pathogen comprising a* reagent capable of detectably targeting a gene sequence of 
claim 1 1 in a biological sample of an animal. 

15 

14. An isolated protein produced by expression of a gene sequence of 
claim 10. 

15. An isolated pathogen protein produced by expression of a gene 
20 sequence of claim 11. 



16. A therapeutic composition comprising a protein or fragment thereof 
selected from the group consisting of a protein of claim 10 and a protein of claim 15. 

25 17. A method for diagnosing a selected disease or infection in an animal 

comprising: 

a. providing a first surface on which is immobilized at pre- 
defined regions on said surface a plurality of defined oligonucleotide/polynucleotide 
sequences, each sequence selected from the group consisting of a fragment of an EST, 

30 an entire EST a fragment of a gene or an entire gene, isolated from a DNA library 
prepared from at least one selected cell, tissue, organ or organism sample of a healthy 
animal and present in excess relative to the polynucleotide to be hybridized; 

b. providing a second surface on which is immobilized at pre- 
defined regions of said surface a plurality of defined oligonucleotide/polynucleotide 

35 sequences, each sequence comprising a fragment of an EST isolated from at least one 
said tissue of an animal having said disease; 
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- - cr ' * detectably hybridizing to a tct of said first and second surfaces 

polynucleotide sequences isolated from a DNA library preparedfrom a sample from a 
healthy animal, said sample selected from a cell or tissue sample analogous to the 
sample of step (a), said hybridization sufficient to form a first and second 
5 hybridization pattern on each said first and second surface; 

& detectably hybridizing to a set of said first and second surfaces 
polynucleotide sequences isolated from a DNA library prepared from a sample from 
an animal having said disease, said sample selected from a cell or tissue sample 
analogous to the sample of step (c), said hybridization sufficient to form a third and 

10 fourth hybridization pattern on each said first and second surface; 

e. comparing the four hybridization patterns, wherein substantial 
differences between the first and third hybridization patterns and the second and 
fourth hybridization patterns indicates the presence of said selected disease or 
infection in said animal, and substantial similarities in said first and third 

IS hybridization patterns and second and fourth hybridization patterns indicates the 
absence of disease or infection. 



18. A method for diagnosing a selected disease or infection in an animal 
comprising: 

20 a. providing a surface on which is immobilized at pre-defined 

regions on said surface a plurality of defined oligonucleotide/polynucleotide 
sequences, each sequence comprising a fragment of an EST isolated from a DNA 
library prepared from the group consisting of a selected cell or tissue sample of a 
healthy animal and an analogous selected cell or tissue sample of an animal having 

25 said disease; 

b. detectably hybridizing to a first copy of said surface 
polynucleotide sequences isolated from a sample from a healthy animal, said sample 
selected from a cell or tissue sample analogous to the sample of step (a), said 
hybridization sufficient to form a first hybridization pattern on said surface; 

30 c. detectably hybridizing to a second copy of said surface 

polynucleotide sequences isolated from a DNA library prepared from a sample from 
an animal having said disease, said sample selected from a cell or tissue sample 
analogous to the sample of step (a), said hybridization sufficient to form a second 
hybridization pattern on said surface; 

35 d. comparing the two hybridization patterns, wherein substantial 

differences between the first and second hybridization patterns indicates the presence 
of said selected disease or infection in said animal, and substantial similarities in said 
first and second hybridization patterns indicates the absence of disease or infection. 
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COMPARATIVE GENE TRANSCRIPT ANALYSIS 

1. FIELD OF INVENTION 
The present invention is in the field of molecular 
biology and computer science; more particularly, the 
5 present invention describes methods of analyzing gene 

transcripts and diagnosing the genetic expression of cells 
and tissue. 



2. BACKGRO UND OF THE INVENTTOM 

Until very recently, the history of molecular biology 
10 has been written one gene at a time. Scientists have 
observed the cell's physical changes, isolated mixtures 
from the cell or its milieu, purified proteins, sequenced 
proteins and therefrom constructed probes to look for the 
- corresponding gene. 
15 Recently, different nations have set up massive 

projects to sequence the billions of bases in the human 
genome. These projects typically begin with dividing the 
genome into large portions of chromosomes and then 
determining the sequences of these pieces, which are then 
20 analyzed for identity with known proteins or portions 

thereof, known as motifs. Unfortunately, the majority of 
genomic DNA does not encode proteins and though it is 
postulated to have some effect on the cell's ability to 
make protein, its relevance to medical applications is not 
25 understood at this time. 

A third methodology involves sequencing only the 
transcripts encoding the cellular machinery actively 
involved in making protein, namely the mRNA. The advantage 
is that the cell has already edited out all the non-coding 
DNA, and it is relatively easy to identify the protein- 
coding portion of the RNA. The utility of this approach 
was not immediately obvious to genomic researchers. m 
fact, when cDNA sequencing was initially proposed, the 
method was roundly denounced by those committed to genomic 
35 sequencing. For example, the head of the U.S. Human Genome 
project discounted CDNA sequencing as not valuable and 
refused to approve funding of projects. 

In this disclosure, we teach methods for analyzing 
DNA, including cDNA libraries. Based on our analyses and 
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researcn '_ ^ e see each individual gene product as a "pixel" 
of information, which relates to the expression of that, 
and only that, gene. We teach herein, methods whereby the 
individual "pixels" of gene expression information can be 
5 combined into a single gene transcript "image, in which 
each of the individual genes can be visualized 
simultaneously and allowing relationships between the gene 
pixels to be easily visualized and understood. 

We further teach a new method which we call electronic 
10 subtraction. Electronic subtraction will enable the gene 
researcher to turn a single image into a moving picture, 
one which describes the temporality or dynamics of gene 
expression, at the level of a cell or a whole tissue. it 
is that sense of "motion" of cellular machinery on the 
15 scale of a cell or organ which constitutes the new 

invention herein. This constitutes a new view into the 
process of living cell physiology and one which holds great 
promise to unveil and discover new therapeutic and 
diagnostic approaches in medicine. 

We teach another method which we call "electronic 
northern," which tracks the expression of a single gene 
across many types of cells and tissues. 

Nucleic acids (DNA and RNA) carry within their 
sequence the hereditary information and are therefore the 
25 prime molecules of life. Nucleic acids are found in all 

living organisms including bacteria, fungi, viruses, plants 
and animals, it is of interest to determine the relative 
abundance of different discrete nucleic acids in different 
cells, tissues and organisms over time under various 
30 conditions, treatments and regimes. 

All dividing cells in the human body contain the same 
set of 23 pairs of chromosomes, it is estimated that these 
autosomal and sex chromosomes encode approximately 100,000 
genes. The differences among different types of cells are 
believed to reflect the differential expression of the 
100,000 or so genes. Fundamental questions of biology 
could be answered by understanding which genes are 
transcribed and knowing the relative abundance of 
transcripts in different cells. 
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Previ usly, the art has only provided for the analysis 
of a few "known genes at a time by standard_ molecular 
biology techniques such as PCR, northern blot analysis, or 
other types of DNA probe analysis such as in situ 
5 hybridization. Each of these methods allows one to analyze 
the transcription of only known genes and/ or small numbers 
of genes at a time. Nucl. Acids Res. 19, 7097-7104 (1991); 
Nucl. Acids Res. 18, 4833-42 (1990); Nucl. Acids Res. 18, 
2789-92 (1989); European J. Neuroscience 2, 1063-1073 ■ 

(1990) ; Analytical Biochem. 187. 364-73 (1990); Genet. 
Annals Techn. Appl. 7, 64-70 (1990); GATA 8(4), 129-33 

(1991) ; Proc. Natl. Acad. Sci. USA 85, 1696-1700 (1988); 
Nucl. Acids Res. IE, 1954 (1991); Proc. Natl. Acad. Sci. 
USA £8, 1943-47 (1991); Nucl. Acids Res. 19, 6123-27 

15 (1991); Proc. Natl. Acad. Sci. USA 85, 5738-42 (1988); 
Nucl. Acids Res. 10937 (1988) . 

Studies of the number and types of genes whose 
transcription is induced or otherwise regulated during cell 
processes such as activation, differentiation, aging, viral 
20 transformation, morphogenesis, and mitosis have been 

pursued for many years, using a variety of methodologies. 
One of the earliest methods was to isolate and analyze 
levels of the proteins in a cell, tissue, organ system, or 
even organisms both before and after the process of 
25 interest. One method of analyzing multiple proteins in a 
sample is using 2-dimensional gel electrophoresis, wherein 
proteins can be, in principle, identified and quantified as 
individual bands, and ultimately reduced to a discrete 
signal. At present, 2-dimensional analysis only resolves 
30 approximately 15% of the proteins. m order to positively 
analyze those bands which are resolved, each band must be 
excised from the membrane and subjected to protein sequence 
analysis using Edman degradation. Unfortunately, most of 
the bands were present in quantities too small to obtain a 
35 reliable sequence, and many of those bands contained more 
than one discrete protein. An additional difficulty is 
that many of the proteins were blocked at the 
amino-terminus, further complicating the sequencing 
process. 
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Analyzing differentiation at the gene transcription 
level hag-6vercome many 6f these" disadvantages and 
drawbacks, since the power of recombinant DNA technology 
allows amplification of signals containing very small 
5 amounts of material. The most common method, called 
"hybridization subtraction," involves isolation of mRNA 
from the biological specimen before (B) and after (A) the 
developmental process of interest, transcribing one set of 
mRNA into cDNA, subtracting specimen B from specimen A 
10 (mRNA from cDNA) by hybridization, and constructing a cDNA 
library from the non-hybridizing mRNA fraction. Many 
different groups have used this strategy successfully, and 
a variety of procedures have been published and improved 
upon using this same basic scheme. Nucl. Acids Res 19 
15 7097-7104 (1991); Nucl. Acids Res. 18, 4833-42 (1990); 
• Nucl. Acids Res. 18, 2789-92 (1989); European J. 
Neuroscience 2, 1063-1073 (1990); Analytical Biochem. 182, 
364-73 (1990); Genet. Annals Techn. Appl. 2 , 64-70 (1990). 
GATAK4), 129-33 (1991, , Proc . Natl . Acad . ^ ^ 
20 1696-1700 (1988); Nucl. Acids Res. 19, 1954 (1991); Proc 
Natl. Acad. Sci. USA 88, 1943-47 (1991); Nucl. Acids Res. 
11, 6123-27 (1991); Proc. Natl. Acad. Sci. USA 85, 5738-42 
(1988); Nucl. Acids Res. 16, 10937 (1988). 

Although each of these techniques have particular 
strengths and weaknesses, there are still some limitations 
and undesirable aspects of these methods: First, the time 
and effort required to construct such libraries is quite 
large. Typically, a trained molecular biologist might 
expect construction and characterization of such a library 
to require 3 to 6 months, depending on the level of skill 
experience, and luck. Second, the resulting subtraction ' 
libraries are typically inferior to the libraries 
constructed by standard methodology, a typical 
conventional cDNA library should have a clone complexity of 
at least 10 6 clones, and an average insert size of 1-3 kB 
in contrast, subtracted libraries can have complexities of 
10 or 10 3 and average insert sizes of 0.2 kB. Therefore, 
there can be a significant loss of clone and sequence 
information associated with such libraries. Third, this 
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appr ach allows the researcher to capture only the genes 
induced in 'specimen A relative to specimen_B, not 
vice-versa, nor does it easily allow comparison to a third 
specimen of interest (C) . Fourth, this approach requires 
5 very large amounts (hundreds of micrograms) of "driver" 
mRNA (specimen B) , which significantly limits the number 
and type of subtractions that are possible since many 
tissues and cells are very difficult to obtain in large 
quantities. 

10 Fifth, the resolution of the subtraction is dependent 

upon the physical properties of DNA:DNA or RNA : DNA 
hybridization. The ability of a given sequence to find a 
hybridization match is dependent on its unique CoT value. 
The CoT value is a function of the number of copies 
15 (concentration) of the particular sequence, multiplied by 
the time of hybridization. It follows that for sequences 
which are abundant, hybridization events will occur very 
rapidly (low CoT value) , while rare sequences will form 
duplexes at very high CoT values. CoT values which allow 

20 such rare sequences to form duplexes and therefore be 
effectively selected are difficult to achieve in a 
convenient time frame. Therefore, hybridization 
subtraction is simply not a useful technique with which to 
study relative levels of rare mRNA species. Sixth, this 

25 problem is further complicated by the fact that duplex 
formation is also dependent on the nucleotide base 
composition for a given sequence. Those sequences rich in 
G + C form stronger duplexes than those with high contents 
of A + T. Therefore, the former sequences will tend to be 

30 removed selectively by hybridization subtraction. Seventh, 
it is possible that hybridization between nonexact matches 
can occur. When this happens, the expression of a 
homologous gene may "mask" expression of a gene of 
interest, artificially skewing the results for that 

35 particular gene. 

Matsubara and Okubo proposed using partial cDNA 
sequences to establish expression profiles of genes which 
could be used in functional analyses of the human genome. 
Matsubara and Okubo warned against using random priming, as 
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it creates multiple unique DNA fragments from individual 
*RNAs and ihay thus skew the analysis of the number of 
particular mRNAs per library. They sequenced randomly 
selected members from a 3 '-directed cDNA library and 
5 established the frequency of appearance of the various 
ESTs. They proposed comparing lists of ESTs from various 
cell types to classify genes. Genes expressed in many 
different cell types were labeled housekeepers and those 
selectively expressed in certain cells were labeled cell- 
■0 specific genes, even in the absence of the full sequence of 
the gene or the biological activity of the gene product. 

The present invention avoids the drawbacks of the 
prior art by providing a method to quantify the relative 
abundance of multiple gene transcripts in a given 
5 biological specimen by the use of high-throughput 

sequence-specific analysis of individual RNAs and/or their 
corresponding cDNAs . 

The present invention offers several advantages over 
current protein discovery methods which attempt to isolate 
) individual proteins based upon biological effects. The 
inethod of the instant invention provides for detailed 
diagnostic comparisons of cell profiles revealing numerous 
changes in the expression of individual transcripts 

The instant invention provides several advantages over 
current subtraction methods including a more complel 
library analysis (io< to io' clones as compared tQ ^ 
clones, which allows identification of low abundance 

ZHT S ^ S W611 " enabling identi ^«ti°n of messages 
which either increase or decrease in abundance. These 



large libraries are very routine to make in contrast to the 

easily be distinguished with the method of the instant 
invention. 



35 



This method is very convenient because it organizes a 
large quantity of data into a comprehensible, digestible 
format The most significant differences are highlighted 

:i:z::r: c subtraotion - in ^ - — — 
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The present invention provides several advantages over 
previous methods of electronic analysis of_cDNA. The 
method is particularly powerful when more than 100 and 
preferably more than 1,000 gene transcripts are analyzed. 
5 In such a case, new low-frequency transcripts are 
discovered and tissue typed. 

High resolution analysis of gene expression can be 
used directly as a diagnostic profile or to identify 
disease-specific genes for the development of more classic 
10 diagnostic approaches. 

This process is defined as gene transcript frequency 
analysis. The resulting quantitative analysis of the gene 
transcripts is defined as comparative gene transcript 
analysis. 



15 3. SUMMARY OF THE INVENTION 

The invention is a method of analyzing a specimen 
containing gene transcripts comprising the steps of (a) 
producing a library of biological sequences; (b) generating 
a set of transcript sequences, where each of the transcript 

20 sequences in said set is indicative of a different one of 
the biological sequences of the library; (c) processing the 
transcript sequences in a programmed computer (in which a 
database of reference transcript sequences indicative of 
reference sequences is stored) , to generate an identified 

25 sequence value for each of the transcript sequences, where 
each said identified sequence value is indicative of 
sequence annotation and a degree of match between one of 
the biological sequences of the library and at least one of 
the reference sequences; and (d) processing each said 

30 identified sequence value to generate final data values 

indicative of the number of times each identified sequence 
value is present in the library. 

The invention also includes a method of comparing two 
specimens containing gene transcripts. The first specimen 

35 is processed as described above. The second specimen is 
used to produce a second library of biological sequences, 
which is used to generate a second set of transcript 
sequences, where each of the transcript sequences in the 
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SeCOnd *??.. lB indicative of one^of the biological sequences 
of the" second library. Then the second set of transcript 
sequences is processed in a programmed computer to generate 
a second set of identified sequence values, namely the 
5 further identified sequence values, each of which is 

indicative of a sequence annotation and includes a degree 
of match between one of the biological sequences of the 
second library and at least one of the reference sequences. 
The further identified sequence values are processed to 
10 generate further final data values indicative of the number 
of times each further identified sequence value is present 
in the second library. The final data values from the 
first specimen and the further identified sequence values 
from the second specimen are processed to generate ratios 
15 of transcript sequences, which indicate the differences in 
the number of gene transcripts between the two specimens. 

In a further embodiment, the method includes 
quantifying the relative abundance of mRNA in a biological 
specimen by (a) isolating a population of mRNA transcripts 
20 from a biological specimen; (b) identifying genes from 
which the mRNA was transcribed by a sequence-specific 
method; (c) determining the numbers of mRNA transcripts 
corresponding to each of the genes; and (d) using the mRNA 
transcript numbers to determine the relative abundance of 
mRNA transcripts within the population of mRNA transcripts. 

Also disclosed is a method of producing a gene 
transcript image analysis by first obtaining a mixture of 
mRNA, from which cDNA copies are made. The cDNA is 
inserted into a suitable vector which is used to transfect 
suitable host strain cells which are plated out and 
permitted to grow into clones, each cone representing a 
unique mRNA. A representative population of clones 
transfected with cDNA is isolated. Each clone in the 
population is identified by a sequence-specific method 
which identifies the gene from which the unique mRNA was 
transcribed. The number of times each gene is identified 
to a clone is determined to evaluate gene transcript 
abundance. The genes and their abundances are listed in 
order of abundance to produce a gene transcript image. 
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In a further embodiment, the relative abundance of the 
gene transcripts in one cell type or tissue is compared 
with the relative abundance of gene transcript numbers in a 
second cell type or tissue in order to identify the 
5 differences and similarities. 

In a further embodiment, the method includes a system 
for analyzing a library of biological sequences including a 
means for receiving a set of transcript sequences, where 
each of the transcript sequences is indicative of a 
10 different one of the biological sequences of the library; 
and a means for processing the transcript sequences in a 
computer system in which a database of reference transcript 
sequences indicative of reference sequences is stored, 
wherein the computer is programmed with software for 
15 generating an identified sequence value for each of the 
transcript sequences, where each said identified sequence 
value is indicative of a sequence annotation and the degree 
of match between a different one of the biological 
sequences of the library and at least one of the reference 
20 sequences, and for processing each said identified sequence 
value to generate final data values indicative of the 
number of times each identified sequence value is present 
in the library. 

In essence, the invention is a method and system for 
25 quantifying the relative abundance of gene transcripts in a 
biological specimen. The invention provides a method for 
comparing the gene transcript image from two or more 
different biological specimens in order to distinguish 
between the two specimens and identify one or more genes 
30 which are differentially expressed between the two 
specimens. Thus, this gene transcript image and its 
comparison can be used as a diagnostic. One embodiment of 
the method generates high-throughput sequence-specific 
analysis of multiple RNAs or their corresponding cDNAs: a 
35 gene transcript image. Another embodiment of the method 

produces the gene transcript imaging analysis by the use of 
high-throughput cDNA sequence analysis. In addition, two 

r more gene transcript images can be compared and used to 
detect or diagnose a particular biological state, disease, 
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or condition which is correlated to the relative abundance 
of gene transcripts in a given cell or population of cells. 

<• DESCRIPT ION OF THE TABLES AND DRAWTUfiB 
4.1. TABLES 

5 Table 1 presents a detailed explanation of the letter 

codes utilized in Tables 2-5. 

Table 2 lists the one hundred most common gene 
transcripts. It is a partial list of isolates from the 
HUVEC cDNA library prepared and sequenced as described 
10 below. The left-hand column refers to the sequence's order 
of abundance in this table. The next column labeled 
"number" is the clone number of the first HUVEC sequence 
identification reference matching the sequence in the 
"entry" column number. Isolates that have not been 
15 sequenced are not present in Table 2. The next column, 

labeled »N", indicates the total number of cDNAs which have 
the same degree of match with the sequence of the reference 
transcript in the "entry" column. 

The column labeled "entry" gives the NIH GENBANK locus 
name, which corresponds to the library sequence numbers. 
The "s" column indicates in a few cases the species of the 
reference sequence. The code for column "s» is given in 
Table 1. The column labeled "descriptor" provides a plain 
English explanation of the identity of the sequence 
corresponding to the NIH GENBANK locus name in the "entry" 
column. 

Table 3 is a comparison of the top fifteen most 
abundant gene transcripts in normal monocytes and activated 
macrophage cells. 

Table 4 is a detailed summary of library subtraction 
analysis summary comparing the THP-l and human macrophage 
cDNA sequences. In Table 4, the same code as in Table 2 is 
used. Additional columns are for "bgfreq" (abundance 
number in the subtractant library) , "rfend" (abundance 
number in the target library) and "ratio" (the target 
abundance number divided by the subtractant abundance 
number) . As is clear from perusal of the table, when the 
abundance number in the subtractant library is "0", the 
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target abundance number is divided by 0.05. This is a way 

f obtaining a result (not possible dividing by 0) and 
distinguishing the result from ratios of subtractant 
numbers of 1. 

5 Table 5 is the computer program, written in source 

code, for generating gene transcript subtraction profiles. 

Table 6 is a partial listing of database entries used 
in the electronic northern blot analysis as provided by the 
present invention. 
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4 - 2 « BRIEF DE SCRIPTION OF THE DRAWINGS 

Figure l is a chart summarizing data collected and 
stored regarding the library construction portion of 
sequence preparation and analysis. 

15 figure 2 is a diagram representing the sequence of 

operations performed by "abundance sort" software in a 
class of preferred embodiments of the inventive method. 

Fiqure 3 is a block diagram of a preferred embodiment 
of the system of the invention. 

20 Fiqure 4 is a more detailed block diagram of the 

bioinformatics process from new sequence (that has already 
been sequenced but not identified) to printout of the 
transcript imaging analysis and the provision of database 
subscriptions . 

25 s « DETAILED DESCRIPTION OF THE INVENTION 

The present invention provides a method to compare the 
relative abundance of gene transcripts in different 
biological specimens by the use of high-throughput 
sequence-specific analysis of individual RNAs or their 

30 corresponding cDNAs (or alternatively, of data representing 
other biological sequences) . This process is denoted 
herein as gene transcript imaging. The quantitative 
analysis of the relative abundance for a set of gene 
transcripts is denoted herein as "gene transcript image 
35 analysis" or "gene transcript frequency analysis". The 
present invention allows one to obtain a profile for gene 
transcription in any given population of cells or tissue 
from any type of organism. The invention can be applied to 
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obtain a profile of a specimen consisting of a single cell 
(or clones'" of a single cell) , or of many cells, or of 
tissue more complex than a single cell and containing 
multiple cell types, such as liver. 
5 The invention has significant advantages in the fields 

of diagnostics, toxicology and pharmacology, to name a few. 
A highly sophisticated diagnostic test can be performed on 
the ill patient in whom a diagnosis has not been made. A 
biological specimen consisting of the patient's fluids or 
10 tissues is obtained, and the gene transcripts are isolated 
and expanded to the extent necessary to determine their 
identity. Optionally, the gene transcripts can be 
converted to cDNA. A sampling of the gene transcripts are 
subjected to sequence-specific analysis and quantified. 
15 These gene transcript sequence abundances are compared 
against reference database sequence abundances including 
normal data sets for diseased and healthy patients. The 
patient has the disease (s) with which the patient's data 
set most closely correlates. 

For example, gene transcript frequency analysis can be 
used to differentiate normal cells or tissues from diseased 
cells or tissues, just as it highlights differences between 
normal monocytes and activated macrophages in Table 3. 

In toxicology, a fundamental question is which tests 
25 are most effective in predicting or detecting a toxic 

effect. Gene transcript imaging provides highly detailed 
information on the cell and tissue environment, some of 
which would not be obvious in conventional, less detailed 
screening methods. The gene transcript image is a more 
powerful method to predict drug toxicity and efficacy. 
Similar benefits accrue in the use of this tool in 
pharmacology. The gene transcript image can be used 
selectively to look at protein categories which are 
expected to be affected, for example, enzymes which 
35 detoxify toxins. 

In an alternative embodiment, comparative gene 
transcript frequency analysis is used to differentiate 
between cancer cells which respond to anti-cancer agents 
and those which do not respond. Examples of anti-cancer 
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agents are tamoxifen, vincristine, vinblastine, 
podophyiiotoxins, etoposide, tenisposide,_cisplatin, 
biologic response modifiers such as interferon, 11-2, GM- 
CSF, enzymes, hormones and the like. This method also 
5 provides a means for sorting the gene transcripts by 
functional category. In the case of cancer cells, 
transcription factors or other essential regulatory 
molecules are very important categories to analyze across 
different libraries. 

10 m yet another embodiment, comparative gene transcript 

frequency analysis is used to differentiate between control 
liver cells and liver cells isolated from patients treated 
with experimental drugs like FIAU to distinguish between 
pathology caused by the underlying disease and that caused 

15 by the drug. 

In yet another embodiment, comparative gene transcript 
frequency analysis is used to differentiate between brain 
tissue from patients treated and untreated with lithium. 
In a further embodiment, comparative gene transcript 
20 frequency analysis is used to differentiate between 
cyclosporin and FK506-treated cells and normal cells. 

In a further embodiment, comparative gene transcript 
frequency analysis is used to differentiate between virally 
infected (including HIV-infected) human cells and 
25 uninfected human cells. Gene transcript frequency analysis 
is also used to rapidly survey gene transcripts in HIV- 
resistant, HIV-infected, and HIV-sensitive cells. 
Comparison of gene transcript abundance will indicate the 
success of treatment and/or new avenues to study. 

In a further embodiment, comparative gene transcript 
frequency analysis is used to differentiate between 
bronchial lavage fluids from healthy and unhealthy patients 
with a variety of ailments. 

In a further embodiment, comparative gene transcript 
frequency analysis is used to differentiate between cell, 
plant, microbial and animal mutants and wild-type species. 
In addition, the transcript abundance program is adapted to 
permit the scientist to evaluate the transcription of one 
gene in many different tissues. Such comparisons could 
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identify deletion mutants which do not produce a gene 
product and point mutants which produce a_less abundant or 
otherwise different message. Such mutations can affect 
basic biochemical and pharmacological processes, such as 
5 mineral nutrition and metabolism, and can be isolated by 
means known to those skilled in the art. Thus, crops with 
improved yields, pest resistance and other factors can be 
developed. 

In a further embodiment, comparative gene transcript 
10 frequency analysis is used for an interspecies comparative 
analysis which would allow for the selection of better 
pharmacologic animal models, in this embodiment, humans 
and other animals (such as a mouse) , or their cultured 
cells are treated with a specific test agent. The relative 
15 sequence abundance of each cDNA population is determined. 
• If the animal test system is a good model, homologous genes 
in the animal cDNA population should change expression 
similarly to those in human cells. If side effects are 
detected with the drug, a detailed transcript abundance 
analysis will be performed to survey gene transcript 
changes. Models will then be evaluated by comparing basic 
physiological changes. 

In a further embodiment, comparative gene transcript 
frequency analysis is used in a clinical setting to give a 
highly detailed gene transcript profile of a patient's 
cells or tissue (for example, a blood sample) . m 
particular, gene transcript frequency analysis is used to 
give a high resolution gene expression profile of a 
diseased state or condition. 
30 in the preferred embodiment, the method utilizes 

high-throughput cDNA sequencing to identify specific 
transcripts of interest. The generated cDNA and deduced 
amino acid sequences are then extensively compared with 
GENBANK and other sequence data banks as described below. 
35 The method offers several advantages over current protein 
discovery by two-dimensional gel methods which try to 
identify individual proteins involved in a particular 
biological effect. Here, detailed comparisons of profiles 
of activated and inactive cells reveal numerous changes in 
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the expression of individual transcripts. After it is 
determined* if the sequence is an "exact" match, similar or 
a non-match, the sequence is entered into a database. 
Next, the numbers of copies of cDNA corr sponding to each 
5 gene are tabulated. Although this can be done slowly and 
arduously, if at all, by human hand from a printout of all 
entries, a computer program is a useful and rapid way to 
tabulate this information. The numbers of cDNA copies 
(optionally divided by the total number of sequences in the 

10 data set) provides a picture of the relative abundance of 
transcripts for each corresponding gene. The list of 
represented genes can then be sorted by abundance in the 
cDNA population. A multitude of additional types of 
comparisons or dimensions are possible and are exemplified 

15 below. 

An alternate method of producing a gene transcript 
image includes the steps of obtaining a mixture of test 
mRNA and providing a representative array of unique probes 
whose sequences are complementary to at least some of the 

20 test mRNAs. Next, a fixed amount of the test mRNA is added 
to the arrayed probes. The test mRNA is incubated with the 
probes for a sufficient time to allow hybrids of the test 
mRNA and probes to form. The mRNA-probe hybrids are 
detected and the quantity determined. The hybrids are 

25 identified by their location in the probe array. The 
quantity of each hybrid is summed to give a population 
number. Each hybrid quantity is divided by the population 
number to provide a set of relative abundance data termed a 
gene transcript image analysis. 

30 6. EXAMPLES 

The examples below are provided to illustrate the 
subject invention. These examples are provided by way of 
illustration and are not included for the purpose of 
limiting the invention. 

35 6 «1« TISSUE SOURCES AMD CELL LIMES 

For analysis with the computer program claimed herein, 
biological sequences can be obtained from virtually any 
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source. Most popular are tissues obtained from the human 
body. Tissues can be obtained from any organ of the body, 
any age donor, any abnormality or any immortalized cell 
line. Immortal cell lines may be preferred in some 
5 instances because of their purity of cell type; other 
tissue samples invariably include mixed cell types. A 
special technique is available to take a single cell (for 
example, a brain cell) and harness the cellular machinery 
to grow up sufficient cDNA for sequencing by the techniques 
10 and analysis described herein (cf. U.S. Patent Nos. 
5,021,335 and 5,168,038, which are incorporated by 
reference) . The examples given herein utilized the 
following immortalized cell lines: monocyte-like U-937 
cells, activated macrophage-like THP-l cells, induced 
vascular endothelial cells (HUVEC cells) and mast cell-like 
HMC-l cells. 

The U-937 cell line is a human histiocytic lymphoma 
cell line with monocyte characteristics, established from 
malignant cells obtained from the pleural effusion of a 
patient with diffuse histiocytic lymphoma (Sundstrom, C. 
and Nilsson, K. (1976) Int. J. Cancer 17:565). U-937 is 
one of only a few human cell lines with the morphology, 
cytochemistry, surface receptors and monocyte-like 
characteristics of histiocytic cells. These cells can be 
induced to terminal monocytic differentiation and will 
express new cell surface molecules when activated with 
supernatants from human mixed lymphocyte cultures. Upon 
this type of in vitro activation, the cells undergo 
morphological and functional changes, including 
augmentation of antibody-dependent cellular cytotoxicity 
(ADCC) against erythroid and tumor target cells (one of the 
principal functions of macrophages) . Activation of u-937 
cells with phorbol 12-myristate 13 -acetate (PMA) in vitro 
stimulates the production of several compounds, including 
prostaglandins, leukotrienes and platelet-activating factor 
(PAF) , which are potent inflammatory mediators. Thus, u- 
937 is a cell line that is well suited for the 
identification and isolation of gene transcripts associated 
with normal monocytes. 
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The HUVEC cell line is a normal, homogeneous, well 
characterized, early passage endothelial cell culture from 
human umbilical vein (Cell Systems Corp., 12815 NE 124th 
Stre t, Kirkland, WA 98034). Only gene transcripts from 
5 induced, or treated, HUVEC cells were sequenced. One batch 
of 1 X 10 8 cells was treated for 5 hours with 1 U/ml rIL-lb 
and 100 ng/ml E.coli lipopolysaccharide (LPS) endotoxin 
prior to harvesting. A separate batch of 2 X 10 8 cells was 
treated at confluence with 4 U/ml TNF and 2 U/ml 

10 interferon-gamma (IFN-gamma) prior to harvesting. 

THP-l is a human leukemic cell line with distinct 
monocytic characteristics. This cell line was derived from 
the blood of a 1-year-old boy with acute monocytic leukemia 
(Tsuchiya, S. et al. (1980) Int. J. Cancer: 171-76). The 

15 following cytological and cytochemical criteria were used 
to determine the monocytic nature of the cell line: l) the 
presence of alpha-naphthyl butyrate esterase activity which 
could be inhibited by sodium fluoride; 2) the production of 
lysozyme; 3) the phagocytosis of latex particles and 

20 sensitized SRBC (sheep red blood cells); and 4) the ability 
of mitomycin C-treated THP-l cells to activate T- 
lymphocytes following ConA (concanavalin A) treatment. 
Morphologically, the cytoplasm contained small azurophilic 
granules and the nucleus was indented and irregularly 

25 shaped with deep folds. The cell line had Fc and C3b 
receptors, probably functioning in phagocytosis. THP-l 
cells treated with the tumor promoter 12-o-tetradecanoyl- 
phorbol-13 acetate (TPA) stop proliferating and 
differentiate into macrophage-like cells which mimic native 

30 monocyte-derived macrophages in several respects. 

Morphologically, as the cells change shape, the nucleus 
becomes more irregular and additional phagocytic vacuoles 
appear in the cytoplasm. The differentiated THP-l cells 
also exhibit an increased adherence to tissue culture 
35 plastic. 

HMC-1 cells (a human mast cell line) were established 
from the peripheral blood of a Mayo Clinic patient with 
mast cell leukemia (Leukemia Res. (1988) 12:345-55). The 
cultured cells looked similar to immature cloned murine 
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mast cells, contained histamine, and stained positively for 
chforbaceta'te esterase, amino caproate esterase, eosinophil 
major basic protein (MBP) and tryptase. The HMC-l cells 
have, however, lost the ability to synthesize normal igE 
5 receptors. HMC-l cells also possess a 10, -16 translocation, 
present in cells initially collected by leukophoresis from 
the patient and not an artifact of culturing. Thus, HMC-l 
cells are a good model for mast cells. 



10 



6.2. CONSTRUC TION OF cD NA LIBR&RTKfi 

For inter-library comparisons, the libraries must be 
prepared in similar manners. Certain parameters appear to 
be particularly important to control. one such parameter 
is the method of isolating mRNA. It is important to use 
the same conditions to remove DNA and heterogeneous nuclear 
15 rna from comparison libraries, size fractionation of cDNA 
must be carefully controlled. The same vector preferably 
should be used for preparing libraries to be compared. At 
the very least, the same type of vector (e.g., 
unidirectional vector) should be used to assure a valid 
20 comparison. A unidirectional vector may be preferred in 
order to more easily analyze the output. 

It is preferred to prime only with oligo dT 
unidirectional primer in order to obtain one only clone per 
mRNA transcript when obtaining cDNAs . However, it is 
25 recognized that employing a mixture of oligo dT and random 
primers can also be advantageous because such a mixture 
results in more sequence diversity when gene discovery also 
is a goal. Similar effects can be obtained with DR2 
(Clontech) and HXLOX (US Biochemical) and also vectors from 
30 Invitrogen and Novagen. These vectors have two 

requirements. First, there must be primer sites for 
commercially available primers such as T3 or M13 reverse 
primers. Second, the vector must accept inserts up to 10 



kB. 



35 



It also is important that the clones be randomly 
sampled, and that a significant population of clones is 
used. Data have been generated with 5,000 clones; however, 
if very rare genes are to be obtained and/ or their relative 
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abundance determined, as many as 100,000 clones from a 
single' library may need to be sampled, size fractionation 
of cDNA also must be carefully controlled. Alternately, 
plaques can be selected, rather than clones. 
5 Besides the Uni-ZAP™ vector system by Stratagene 

disclosed below, it is now believed that other similarly 
unidirectional vectors also can be used. For example, it 
is believed that such vectors include but are not limited 
to DR2 (Clontech), and HXLOX (U.S. Biochemical). 

10 Preferably, the details of library construction (as 

shown in Figure 1) are collected and stored in a database 
for later retrieval relative to the sequences being 
compared. Fig. 1 shows important information regarding the 
library collaborator or cell or cDNA supplier, 

15 pretreatment, biological source, culture, mRNA preparation 
-and cDNA construction. Similarly detailed information 
about the other steps is beneficial in analyzing sequences 
and libraries in depth. 

RNA must be harvested from cells and tissue samples 

20 and cDNA libraries are subsequently constructed. cDNA 

libraries can be constructed according to techniques known 
in the art. (See, for example, Maniatis, T. et al. (1982) 
Molecular Cloning, Cold Spring Harbor Laboratory, New 
York) . cDNA libraries may also be purchased. The U-937 
25 cDNA library (catalog No. 937207) was obtained from 

Stratagene, Inc., 11099 M. Torrey Pines Rd. , La Jolla, CA 
92037. 

The THP-l cDNA library was custom constructed by 
Stratagene from THP-l cells cultured 48 hours with 100 nm 
30 TPA and 4 hours with 1 /xg/ml LPS . The human mast cell HMC- 
1 cDNA library was also custom constructed by Stratagene 
from cultured HMC-i cells. The HUVEC cDNA library was 
custom constructed by Stratagene from two batches of 
induced HUVEC cells which were separately processed. 

Essentially, all the libraries were prepared in the 
same manner. First, poly(A+)RNA (mRNA) was purified. For 
the U-937 and HMC-l RNA, cDNA synthesis was only primed 
with oligo dT. For the THP-l and HUVEC RNA, cDNA synthesis 
was primed separately with both oligo dT and random 
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hexamers, and the two cDNA libraries were treated 
separately*. Synthetic adaptor oligonucleotides were 
ligated onto cDNA ends enabling its insertion into the Uni- 
Zap™ vector system (Stratagene) , allowing high efficiency 
5 unidirectional (sense orientation) lambda library 

construction and the convenience of a plasmid system with 
blue-white color selection to detect clones with cDNA 
insertions. Finally, the two libraries were combined into 
a single library by mixing equal numbers of bacteriophage. 

The libraries can be screened with either DNA probes 
or antibody probes and the pBluescript® phagemid 
(Stratagene) can be rapidly excised in vivo . The phagemid 
allows the use of a plasmid system for easy insert 
characterization, sequencing, site-directed mutagenesis, 
the creation of unidirectional deletions and expression of 
fusion proteins. The custom-constructed library phage 
particles were infected into E. coli host strain XLl-Blue® 
(Stratagene), which has a high transformation efficiency, 
increasing the probability of obtaining rare, under- 
20 represented clones in the cDNA library. 

6*3. ISOLATION OF cDNA CLONES 

The phagemid forms of individual cDNA clones were 
obtained by the in vivo excision process, in which the host 
bacterial strain was coinfected with both the lambda 

25 library phage and an fl helper phage. Proteins derived 

from both the library-containing phage and the helper phage 
nicked the lambda DNA, initiated new DNA synthesis from 
defined sequences on the lambda target DNA and created a 
smaller, single stranded circular phagemid DNA molecule 

30 that included all DNA sequences of the pBluescript® plasmid 
and the cDNA insert. The phagemid DNA was secreted from 
the cells and purified, then used to re-infect fresh host 
cells, where the double stranded phagemid DNA was produced. 
Because the phagemid carries the gene for beta-lactamase, 

35 the newly-transformed bacteria are selected on medium 
containing ampicillin. 

Phagemid DNA was purified using the Magic Minipreps™ 
DNA Purification System (Promega catalogue #A7100. Promega 
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^ Corp., 2800 Woods Hollow Rd. , Madison, WI 53711). This 
small-scale process provides a simple and reliable method 
for lysing the bacterial cells and rapidly isolating 
purified phagemid DNA using a proprietary DNA-binding 
5 resin. The DNA was eluted from the purification resin 
already prepared for DNA sequencing and other analytical 
manipulations . 

Phagemid DNA was also purified using the QIAwell-8 
Plasmid Purification System from QIAGEN® DNA Purification 
10 System (QIAGEN Inc., 9259 Eton Ave., Chattsworth, CA 

91311) . This product line provides a convenient, rapid and 
reliable high-throughput method for lysing the bacterial 
cells and isolating highly purified phagemid DNA using 
QIAGEN anion-exchange resin particles with EMPORE™ membrane 
15 technology from 3M in a multiwell format. The DNA was 

eluted from the purification resin already prepared for DNA 
sequencing and other analytical manipulations. 

An alternate method of purifying phagemid has recently 
become available. It utilizes the Miniprep Kit (Catalog 
20 No. 77468, available from Advanced Genetic Technologies 
Corp., 19212 Orbit Drive, Gaithersburg, Maryland). This 
kit is in the 96-well format and provides enough reagents 
for 960 purifications. Each kit is provided with a 
recommended protocol, which has been employed except for 
25 the following changes. First, the 96 wells are each filled 
with only 1 ml of sterile terrific broth with carbenicillin 
at 25 mg/L and glycerol at 0.4%. After the wells are 
inoculated, the bacteria are cultured for 24 hours and 
lysed with 60 Ml of lysis buffer. A centrif ligation step 
30 (2900 rpm for 5 minutes) is performed before the contents 
of the block are added to the primary filter plate. The 
optional step of adding isopropanol to TRIS buffer is not 
routinely performed. After the last step in the protocol, 
samples are transferred to a Beckman 96-well block for 
35 storage. 

Another new DNA purification system is the WIZARD™ 
product line which is available from Promega (catalog No. 
A7071) and may be adaptable to the 96-well format. 



21 



WO 95/20681 



PCT/OS95/01160 



6.4. SEQUENCING OP CDNA CLONES 

The cDNA inserts from random isolates_of the U-937 and 
THP-1 libraries were sequenced in part. Methods for DNA 
sequencing are well known in the art. Conventional 
5 enzymatic methods employ DNA polymerase Klenow fragment, 
Sequenase™ or Taq polymerase to extend DNA chains from an 
oligonucleotide primer annealed to the DNA template of 
interest. Methods have been developed for the use of both 
single- and double-stranded templates. The chain 
10 termination reaction products are usually electrophoresed 
on urea-acrylamide gels and are detected either by 
autoradiography (for radionuclide-labeled precursors) or by 
fluorescence (for fluorescent-labeled precursors) . Recent 
improvements in mechanized reaction preparation, sequencing 
15 and analysis using the fluorescent detection method have 
permitted expansion in the number of sequences that can be 
determined per day (such as the Applied Biosystems 373 and 
377 DNA sequencer, Catalyst 800) . Currently with the 
system as described, read lengths range from 250 to 400 
bases and are clone dependent. Read length also varies 
with the length of time the gel is run. In general, the 
shorter runs tend to truncate the sequence. A minimum of 
only about 25 to 50 bases is necessary to establish the 
identification and degree of homology of the sequence. 
Gene transcript imaging can be used with any sequence- 
specific method, including, but not limited to 
hybridization, mass spectroscopy, capillary electrophoresis 
and 505 gel electrophoresis. 
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6.5. HOMOLOGY SEARCHING OP cDNA CLONE AND 
DEDUCED PRO TEIN rand Subsemient Steps) 

Using the nucleotide sequences derived from the cDNA 

clones as query sequences (sequences of a Sequence 

Listing) , databases containing previously identified 

sequences are searched for areas of homology (similarity) . 

Examples of such databases include Genbank and EMBL. We 

next describe examples of two homology search algorithms 

that can be used, and then describe the subsequent 

computer-implemented steps to be performed in accordance 

with preferred embodiments of the invention. 
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In th * ; following description of the computer- 
implemented steps of the invention, the word "library" 
denotes a set (or population) of biological specimen 
nucleic acid s quences. A "library" can consist of cDNA 
5 sequences, RNA sequences, or the like, which characterize a 
biological specimen. The biological specimen can consist 
of cells of a single human cell type (or can be any of the 
other above-mentioned types of specimens) . We contemplate 
that the sequences in a library have been determined so as 
10 to accurately represent or characterize a biological 

specimen (for example, they can consist of representative 
CDNA sequences from clones of RNA taken from a single human 
cell) . 

In the following description of the computer- 
implemented steps of the invention, the expression 
"database" denotes a set of stored data which represent a 
collection of sequences, which in turn represent a 
collection of biological reference materials. For example, 
a database can consist of data representing many stored 
cDNA sequences which are in turn representative of human 
cells infected with various viruses, cells of humans of 
vanous ages, cells from different mammalian species, and 
so on. 

In preferred embodiments, the invention employs a 
computer programmed with software (to be described) for 
performing the following steps: 

(a) processing data indicative of a library of cDNA 
sequences (generated as a result of high-throughput cDNA 
sequencing or other method) to determine whether each 
sequence in the library matches a DNA sequence of a 
reference database of DNA sequences (and if so, identifying 
the reference database entry which matches the sequence and 
indicating the degree of match between the reference 
sequence and the library sequence) and assigning an 
identified sequence value based on the sequence annotation 
and degree of match to each of the sequences in the 
library; 

(b) for some or all entries of the database, 
tabulating the number of matching identified sequence 
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values in the library (Although^this can be done by human 
hand from a printout of all entries, we pcefer to perform 
this step using computer software to be described below.), 
thereby generating a set of final data values or "abundance 
5 numbers"; and 

(c) if the libraries are different sizes, dividing 
each abundance number by the total number of sequences in 
the library, to obtain a relative abundance number for each 
identified sequence value (i.e., a relative abundance of 
10 each gene transcript) . 

The list of identified sequence values (or genes 
corresponding thereto) can then be sorted by abundance in 
the cDNA population. A multitude of additional types of 
comparisons or dimensions are possible. 
15 For example (to be described below in greater detail) , 

steps (a) and (b) can be repeated for two different 
libraries (sometimes referred to as a "target" library and 
a "subtractant" library) . Then, for each identified 
sequence value (or gene transcript) , a "ratio" value is 
obtained by dividing the abundance number (for that 
identified sequence value) for the target library, by the 
abundance number (for that identified sequence value) for 
the subtractant library. 

In fact, subtraction may be carried out on multiple 
25 libraries, it is possible to add the transcripts from 

several libraries (for example, three) and then to divide 
them by another set of transcripts from multiple libraries 
(again, for example, three) . Notation for this operation 
may be abbreviated as (A+B+C) / (D+E+F) , where the capital 
letters each indicate an entire library. Optionally the 
abundance numbers of transcripts in the summed libraries 
may be divided by the total sample size before subtraction 

Unlike standard hybridization technology which permits 
a single subtraction of two libraries, once one has 
processed a set or library transcript sequences and stored 
them m the computer, any number of subtractions can be 
performed on the library. For example, by this method, 
ratio values can be obtained by dividing relative abundance 
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values in a first library by corresponding values in a 

second library and vice versa. 

In variations on step (a) , the library consists of 
nucleotide sequences derived from cDNA clones. Examples of 
5 databases which can be searched for areas of homology 

(similarity) in step (a) include the commercially available 
databases known as Genbank (NIH) EMBL (European Molecular 
Biology Labs, Germany), and GENESEQ (Intelligenetics, 
Mountain View, California) . 
10 one homology search algorithm which can be used to 

implement step (a) is the algorithm described in the paper 
by D.J. Lipman and W.R. Pearson, entitled "Rapid and 
Sensitive Protein Similarity Searches," Science , 227:1435 
(1985). In this algorithm, the homologous regions are 
15 searched in a two-step manner. In the first step, the 

highest homologous regions are determined by calculating a 
matching score using a homology score table. The parameter 
"Ktup" is used in this step to establish the minimum window 
size to be shifted for comparing two sequences. Ktup also 
20 sets the number of bases that must match to extract the 
highest homologous region among the sequences. In this 
step, no insertions or deletions are applied and the 
homology is displayed as an initial (INIT) value. 

In the second step, the homologous regions are aligned 
25 to obtain the highest matching score by inserting a gap in 
order to add a probable deleted portion. The matching 
score obtained in the first step is recalculated using the 
homology score Table and the insertion score Table to an 
optimized (OPT) value in the final output. 
30 DNA homologies between two sequences can be examined 

graphically using the Harr method of constructing dot 
matrix homology plots (Needleman, S.B. and Wunsch, CO., J,. 
Mom. Biol 48:443 (1970)). This method produces a 
two-dimensional plot which can be useful in determining 
35 regions of homology versus regions of repetition. 

However, in a class of preferred embodiments, step (a) 
is implemented by processing the library data in the 
commercially available computer program known as the 
INHERIT 670 Sequence Analysis System, available from 
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Applied Biosystems Inc. (Foster^City , California), 
including the software known as'the Factura software (also 
available from Applied Biosystems Inc.). The Factura 
program preprocesses each library sequence to "edit out" 
5 portions thereof which are not likely to be of interest, 
such as the vector used to prepare the library. Additional 
sequences which can be edited out or masked (ignored by the 
search tools) include but are not limited to the polyA tail 
and repetitive GAG and CCC sequences. A low-end search 
10 program can be written to mask out such "low-information" 
sequences, or programs such as BLAST can ignore the low- 
information sequences. 

In the algorithm implemented by the INHERIT 670 
Sequence Analysis System, the Pattern Specification 
Language (developed by TRW Inc.) is used to determine 
regions of homology. "There are three parameters that 
determine how INHERIT analysis runs sequence comparisons: 
window size, window offset and error tolerance, window 
size specifies the length of the segments into which the 
20 query sequence is subdivided, window offset specifies 

where to start the next segment [to be compared] , counting 
from the beginning of the previous segment. Error 
tolerance specifies the total number of insertions, 
deletions and/or substitutions that are tolerated over the 
25 specified word length. Error tolerance may be set to any 
integer between 0 and 6. The default settings are window 
tolerance=20, window offset-io and error tolerance=3 . " 
. INHERIT Analysis Users Manual , pp. 2-15. Version l.o, 
Applied Biosystems, Inc., October 1991. 

Using a combination of these three parameters, a 
database (such as a DNA database) can be searched for 
sequences containing regions of homology and the 
appropriate sequences are scored with an initial value. 
Subsequently, these homologous regions are examined using 
dot matrix homology plots to determine regions of homology 
versus regions of repetition. Smith-Waterman alignments 
can be used to display the results of the homology search. 
The INHERIT software can be executed by a Sun computer 
system programmed with the UNIX operating system. 
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Search alternatives to INHERIT include the BLAST 
program, GCG (available from the Genetics Computer Group, 
WI) and the Dasher program (Temple Smith, Boston 
University, Boston, MA) . Nucleotide sequences can be 
5 searched against Genbank, EMBL or custom databases such as 
GENESEQ (available from Intelligenetics, Mountain View, CA) 
or other databases for genes. In addition, we have 
searched some sequences against our own in-house database. 
In preferred embodiments, the transcript sequences are 
10 analyzed by the INHERIT software for best conformance with 
a reference gene transcript to assign a sequence identifier 
and assigned the degree of homology, which together are the 
identified sequence value and are input into, and further 
processed by, a Macintosh personal computer (available from 
15 Apple) programmed with an "abundance sort and subtraction 
analysis" computer program (to be described below) . 

Prior to the abundance sort and subtraction analysis 
program (also denoted as the "abundance sort" program) , 
identified sequences from the cDNA clones are assigned 
20 value (according to the parameters given above) by degree 
of match according to the following categories: "exact" 
matches (regions with a high degree of identity) , 
homologous human matches (regions of high similarity, but 
not "exact" matches) , homologous non-human matches (regions 
25 of high similarity present in species other than human) , or 
non matches (no significant regions of homology to 
previously identified nucleotide sequences stored in the 
form of the database). Alternately, the degree of match 
can be a numeric value as described below. 
30 With reference again to the step of identifying 

matches between reference sequences and database entries, 
protein and peptide sequences can be deduced from the 
nucleic acid sequences. Using the deduced polypeptide 
sequence, the match identification can be performed in a 
35 manner analogous to that done with cDNA sequences. A 

protein sequence is used as a query sequence and compared 
to the previously identified sequences contained in a 
database such as the Swiss/Prot, PIR and the NBRF Protein 
database to find homologous proteins. These proteins are 
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. . . t"-^ 1 ! 11 ^ SCOred f ° r homo10 ^. !i»ing a homology score Table 
(Orcutt, B.C. and Dayoff, M.O. Scoring -Matrices, PIR 
Report MAT - 0285 (February 1985)) resulting in an INIT 
score. The homologous regions are aligned to obtain the 
5 highest matching scores by inserting a gap which adds a 
probable deleted portion. The matching score is 
recalculated using the homology score Table and the 
insertion score Table resulting in an optimized (OPT) 
score. Even in the absence of knowledge of the proper 
10 reading frame of an isolated sequence, the above-described 
protein homology search may be performed by searching all 3 
reading frames. 

Peptide and protein sequence homologies can also be 
ascertained using the INHERIT 670 Sequence Analysis System 
15 m an analogous way to that used in DNA sequence 

homologies. Pattern Specification Language and parameter 
windows are used to search protein databases for sequences 
contamxng regions of homology which are scored with an 
initial value. Subsequent display in a dot-matrix homology 
Plot shows regions of homology versus regions of 
repetition. Additional search tools that are available to 
use on pattern search databases include PLsearch Blocks 
(available from Henikoff & Henikoff , University of 
Washington, Seattle), Dasher and GCG . Pattern search 
databases include, but are not limited to, Protein Blocks 
(available from Henikoff & Henikoff, University of 
Washington, Seattle), Brookhaven Protein (available from 
the Brookhaven National Laboratory, Brookhaven, MA) 
PROSiTE (available from Amos Bairoch, University 
Switzerland,, ProDom (available from Temple Smith, Bostol 
University), and protein motif fingerprint (available from 
University of Leeds, United Kingdom) . 

The ABI Assembler application software, part of the 
INHERIT DNA analysis system (available from Applied 
Biosystems, Inc., Foster city, CA ) , can be employed to 
create and manage sequence assembly projects by assembling 
data from selected sequence fragments into a larger 
sequence. The Assembler software combines two advanced 
computer technologies which maximize the ability to 
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assemble sequenced DNA fragments into Assemblages, a 
special grouping of data where the relationships between 
sequences are shown by graphic overlap, alignment and 
statistical views. The process is based on the 
5 Meyers-Kececioglu model of fragment assembly (INHERIT™ 
Assembler User's Manual, Applied Biosystems, Inc., Foster 
City, CA) , and uses graph theory as the foundation of a 
very rigorous multiple sequence alignment engine for 
assembling DNA sequence fragments. Other assembly programs 
10 that can be used include MEGALIGN (available from DNASTAR 
Inc., Madison, WI) , Dasher and STADEN (available from Roger 
Staden, Cambridge, England) . 

Next, with reference to Fig. 2, we describe in more 
detail the "abundance sort" program which implements above- 
15 mentioned "step (b) to tabulate the number of sequences of 
the library which match each database entry (the "abundance 
number" for each database entry) . 

Fig. 2 is a flow chart of a preferred embodiment of 
the abundance sort program. A source code listing of this 
20 embodiment of the abundance sort program is set forth in 

Table 5. In the Table 5 implementation, the abundance sort 
program is written using the FoxBASE programming language 
commercially available from Microsoft Corporation. 
Although FoxBASE was the program chosen for the first 
25 iteration of this technology, it should not be considered 
limiting. Many other programming languages, Sybase being a 
particularly desirable alternative, can also be used, as 
will be obvious to one with ordinary skill in the art. The 
subroutine names specified in Fig. 2 correspond to 
30 subroutines listed in Table 5. 

With reference again to Fig. 2, the "Identified 
Sequences" are transcript sequences representing each 
sequence of the library and a corresponding identification 
of the database entry (if any) which it matches. in other 
35 words, the "Identified Sequences" are transcript sequences 
representing the output of above-discussed "step (a)." 

Fig. 3 is a block diagram of a system for implementing 
the invention. The Fig. 3 system includes library 
generation unit 2 which generates a library and asserts an 



29 



WO 95/20681 



PCT/US95/01160 



output stream of transcript sequences indicative of the 
biological* sequences comprising the library. Programmed 
processor 4 receives the data stream output from unit 2 and 
processes this data in accordance with above-discussed 
5 "step (a)" to generate the Identified Sequences. Processor 
4 can be a processor programmed with the commercially 
available computer program known as the INHERIT 670 
Sequence Analysis System and the commercially available 
computer program known as the Factura program (both 
10 available from Applied Biosystems Inc.) and with the UNIX 
operating system. 

Still with reference to Fig. 3, the Identified 
Sequences are loaded into processor 6 which is programmed 
with the abundance sort program. Processor 6 generates the 
15 Final Transcript sequences indicated in both Figs. 2 and 3. 
Fig. 4 shows a more detailed block diagram of a planned 
relational computer system, including various searching 
techniques which can be implemented, along with an 
assortment of databases to query against. 
20 With reference to Fig. 2, the abundance sort program 

first performs an operation known as "Tempnum" on the 
Identified Sequences, to discard all of the Identified 
Sequences except those which match database entries of 
selected types. For example, the Tempnum process can 
25 select Identified Sequences which represent matches of the 
following types with database entries (see above for 
definition) : "exact" matches, human "homologous" matches, 
"other species" matches representing genes present in 
species other than human) , "no" matches (no significant 
30 regions of homology with database entries representing 
previously identified nucleotide sequences), "I" matches 
(Incyte for not previously known DNA sequences) , or "X" 
matches (matches ESTs in reference database) . This 
eliminates the U, S, M, V, A, R and D sequence (see Table 1 
35 for definitions) . 

The identified sequence values selected during the 
"Tempnum" process then undergo a further selection (weeding 
out) operation known as "Tempred." This operation can, for 
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example, discard all identif ied^sequence values 
representing matches with selected database entries. 

The identified sequence values selected during the 
"Tempred" process are then classified according to library, 
5 during the "Tempdesig" operation. It is contemplated that 
the "Identified Sequences" can represent sequences from a 
single library, or from two or more libraries. 

Consider first the case that the identified sequence 
values represent sequences from a single library. In this 
case, all the identified sequence values determined during 
"Tempred" undergo sorting in the "Templib" operation, 
further sorting in the "Libsort" operation, and finally 
additional sorting in the "Temptarsort" operation. For 
example, these three sorting operations can sort the 
15 identified sequences in order of decreasing "abundance 
number" (to generate a list of decreasing abundance 
numbers, each abundance number corresponding to a unique 
identified sequence entry, or several lists of decreasing 
abundance numbers, with the abundance numbers in each list 
20 corresponding to database entries of a selected type) with 
redundancies eliminated from each sorted list. In this 
case, the operation identified as "Cruncher" can be 
bypassed, so that the "Final Data" values are the organi 2 ed 
transcript sequences produced during the "Temptarsort" 
25 operation. 

We next consider the case that the transcript 
sequences produced during the "Tempred" operation represent 
sequences from two libraries (which we will denote the 
"target" library and the "subtractant" library) . For 
example, the target library may consist of cDNA sequences 
from clones of a diseased cell, while the subtractant 
library may consist of cDNA sequences from clones of the 
diseased cell after treatment by exposure to a drug. For 
another example, the target library may consist of cDNA 
sequences from clones of a cell type from a young human, 
while the subtractant library may consist of cDNA sequences 
from clones of the same cell type from the same human at 
different ages. 
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In this case, the "Tempdesig" operation routes all 
transcript sequences representing the target library for 
processing in accordance with "Templib" (and then "Libsort" 
and "Temptarsort") , and routes all transcript sequences 
5 representing the subtractant library for processing in 
accordance with "Tempsub" (and then "Subsort" and 
"Tempsubsort"). For example, the consecutive "Templib," 
"Libsort, " and "Temptarsort" sorting operations sort 
identified sequences from the target library in order of 
10 decreasing abundance number (to generate a list of 
decreasing abundance numbers, each abundance number 
corresponding to a database entry, or several lists of 
decreasing abundance numbers, with the abundance numbers in 
each list corresponding to database entries of a selected 
15 type) with redundancies eliminated from each sorted list. 
•The consecutive "Tempsub," "Subsort," and "Tempsubsort" 
sorting operations sort identified sequences from the 
subtractant library in order of decreasing abundance number 
(to generate a list of decreasing abundance numbers, each 
20 abundance number corresponding to a database entry, or 
several lists of decreasing abundance numbers, with the 
abundance numbers in each list corresponding to database 
entries of a selected type) with redundancies eliminated 
from each sorted list. 
25 The transcript sequences output from the "Temptarsort" 

operation typically represent sorted lists from which a 
histogram could be generated in which position along one 
(e.g., horizontal) axis indicates abundance number (of 
target library sequences) , and position along another 
30 (e.g., vertical) axis indicates identified sequence value 
(e.g., human or non-human gene type). Similarly, the 
transcript sequences output from the "Tempsubsort" 
operation typically represent sorted lists from which a 
histogram could be generated in which position along one 
35 (e.g., horizontal) axis indicates abundance number (of 

subtractant library sequences) , and position along another 
(e.g., vertical) axis indicates identified sequence value 
(e.g., human or non-human gene type). 
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The .55? nsc . ript sequences (sorted lists) output from 

the Tempsubsort and Temptarsort sorting operations are 
combined during the operation identified as "Cruncher." 
The "Cruncher" process identifies pairs of corresponding 
5 target and subtractant abundance numbers (both representing 
the same identified sequence value) , and divides one by the 
other to generate a "ratio" value for each pair of 
corresponding abundance numbers, and then sorts the ratio 
values in order of decreasing ratio value. The data output 
10 from the "Cruncher" operation (the Final Transcript 

sequence in Pig. 2) is typically a sorted list from which a 
histogram could be generated in which position along one 
axis indicates the size of a ratio of abundance numbers 
(for corresponding identified sequence values from target 
and subtractant libraries) and position along another axis 
indicates identified sequence value (e.g., gene type). 

Preferably, prior to obtaining a ratio between the two 
library abundance values, the Cruncher operation also 
divides each ratio value by the total number of sequences 
in one or both of the target and subtractant libraries. 
The resulting lists of "relative" ratio values generated by 
the Cruncher operation are useful for many medical, 
scientific, and industrial applications. Also preferably, 
the output of the Cruncher operation is a set of lists, 
each list representing a sequence of decreasing ratio 
values for a different selected subset (e.g. protein 
family) of database entries. 

In one example, the abundance sort program of the 
invention tabulates for a library the numbers of mRNA 
transcripts corresponding to each gene identified in a 
database. These numbers are divided by the total number of 
clones sampled. The results of the division reflect the 
relative abundance of the mRNA transcripts in the cell type 
or tissue from which they were obtained. Obtaining this 
final data set is referred to herein as "gene transcript 
image analysis." The resulting subtracted data show 
exactly what proteins and genes are upregulated and 
downregulated in highly detailed complexity. 
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6.6. HUVEC CDNA LIBRARY 

Table* 2 is an abundance table listing the various gene 
transcripts in an induced HUVEC library. The transcripts 
are listed in order of decreasing abundance. This 
5 computerized sorting simplifies analysis of the tissue and 
speeds identification of significant new proteins which are 
specific to this cell type. This type of endothelial cell 
lines tissues of the cardiovascular system, and the more 
that is known about its composition, particularly in 
10 response to activation, the more choices of protein targets 
become available to affect in treating disorders of this 
tissue, such as the highly prevalent atherosclerosis. 

6.7. MONOCYTE-CELL AND MAST-GET.T, cDNA LIBRARIES 

Tables 3 and 4 show truncated comparisons of two 
15 libraries, in Tables 3 and 4 the "normal monocytes" are 
the HMC-l cells, and the "activated macrophages" are the 
THP-l cells pretreated with PMA and activated with LPS. 
Table 3 lists in descending order of abundance the most 
abundant gene transcripts for both cell types, with only 
15 gene transcripts from each cell type, this table permits 
quick, qualitative comparison of the most common 
transcripts. This abundance sort, with its convenient 
side-by-side display, provides an immediately useful 
research tool. In this example, this research tool 
discloses that 1) only one of the top 15 activated 
macrophage transcripts is found in the top 15 normal 
monocyte gene transcripts (poly a binding protein); and 2) 
a new gene transcript (previously unreported in other 
databases) is relatively highly represented in activated 
30 macrophages but is not similarly prominent in normal 

macrophages. Such a research tool provides researchers 
with a short-cut to new proteins, such as receptors, cell- 
surface and intracellular signalling molecules, which can 
serve as drug targets in commercial drug screening 
35 programs. Such a tool could save considerable time over 
that consumed by a hit and miss discovery program aimed at 
identifying important proteins in and around cells, because 
those proteins carrying out everyday cellular functions and 
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represented as steady state mRNA are quickly eliminated 

from further characterization. 

This illustrates how the gene transcript profiles 
change with altered cellular function. Those skilled in 
5 the art know that the biochemical composition of cells also 
changes with other functional changes such as cancer, 
including cancer's various stages, and exposure to 
toxicity. A gene transcript subtraction profile such as in 
Table 3 is useful as a first screening tool for such gene 
10 expression and protein studies. 

6.8. SUBTRACTION ANALYSIS OP NORMAL MONOCYTE-CELL AND 
ACTIVATE D MONOCYTE CELL cDNA LIBRARIES 

Once the cDNA data are in the computer, the computer 

program as disclosed in Table 5 was used to obtain ratios 

15 of all the gene transcripts in the two libraries discussed 

in Example 6.7, and the gene transcripts were sorted by the 

descending values of their ratios. If a gene transcript is 

not represented in one library, that gene transcript's 

abundance is unknown but appears to be less than 1. As an 

20 approximation and to obtain a ratio, which would not be 

possible if the unrepresented gene were given an abundance 

of zero — genes which are represented in only one of the 

two libraries are assigned an abundance of 1/2. Using 1/2 

for unrepresented clones increases the relative importance 

25 of »turned-on" and "turned-off" genes, whose products would 

be drug candidates. The resulting print-out is called a 

subtraction table and is an extremely valuable screening 

method, as is shown by the following data. 

Table 4 is a subtraction table, in which the normal 

30 monocyte library was electronically "subtracted" from the 

activated macrophage library. This table highlights most 

effectively the changes in abundance of the gene 

transcripts by activation of macrophages. Even among the 

first 20 gene transcripts listed, there are several unknown 

35 gene transcripts. Thus, electronic subtraction is a useful 

tool with which to assist researchers in identifying much 

more quickly the basic biochemical changes between two cell 

types. Such a tool can save universities and 

pharmaceutical companies which spend billions of dollars on 

35 
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research valuable time and laboratory resources at the 
early discovery stage and can speed up the_ drug development 
cycle, which in turn permits researchers to set up drug 
screening programs much earlier. Thus, this research tool 
5 provides a way to get new drugs to the public faster and 
more economically. 

Also, such a subtraction table can be obtained for 
patient diagnosis. An individual patient sample (such as 
monocytes obtained from a biopsy or blood sample) can be 
10 compared with data provided herein to diagnose conditions 
associated with macrophage activation. 

Table 4 uncovered many new gene transcripts (labeled 
Incyte clones) . Note that many genes are turned on in the 
activated macrophage (i.e., the monocyte had a 0 in the 
15 bgfreq column) . This screening method is superior to other 
screening techniques, such as the western blot, which are 
incapable of uncovering such a multitude of discrete new 
gene transcripts. 

The subtraction-screening technique has also uncovered 
a high number of cancer gene transcripts (oncogenes rho, 
ETS2, rab-2 ras, YPTl-related, and acute myeloid leukemia 
mRNA) in the activated macrophage. These transcripts may 
be attributed to the use of immortalized cell lines and are 
inherently interesting for that reason. This screening 
technique offers a detailed picture of upregulated 
transcripts including oncogenes, which helps explain why 
anti-cancer drugs interfere with the patient's immunity 
mediated by activated macrophages. Armed with knowledge 
gained from this screening method, those skilled in the art 
can set up more targeted, more effective drug screening 
programs to identify drugs which are differentially 
effective against 1) both relevant cancers and activated 
macrophage conditions with the same gene transcript 
profile; 2) cancer alone/ and 3) activated macrophage 
35 conditions. 

Smooth muscle senescent protein (22 kd) was 
upregulated in the activated macrophage, which indicates 
that it is a candidate to block in controlling 
inflammation. 



20 



25 



30 



36 



WO 95/20681 



PCT/US95/01160 



6.9. SUBTRACTION ANALYSIS OF NORMAL LIVER CELLS AND 
-- HEPATITIS INFECTED LIVER CELL cDNA LIBRARIES 

In this example, rats are exposed to~~hepatitis virus 

and maintained in the colony until they show definite signs 

5 of hepatitis. Of the rats diagnosed with hepatitis, one 

half of the rats are treated with a new ant i -hepatitis 

agent (AHA) . Liver samples are obtained from all rats 

before exposure to the hepatitis virus and at the end of 

AHA treatment or no treatment. In addition, liver samples 

10 can be obtained from rats with hepatitis just prior to AHA 

treatment . 

The liver tissue is treated as described in Examples 
6.2 and 6.3 to obtain mRNA and subsequently to sequence 
cDNA. The cDNA from each sample are processed and analyzed 

15 for abundance according to the computer program in Table 5. 
The resulting gene transcript images of the cDNA provide 
detailed pictures of the baseline (control) for each animal 
and of the infected and/or treated state of the animals. 
cDNA data for a group of samples can be combined into a 

20 group summary gene transcript profile for all control 
samples, all samples from infected rats and all samples 
from AHA- treated rats. 

Subtractions are performed between appropriate 
individual libraries and the grouped libraries. For 

25 individual animals, control and post-study samples can be 
subtracted. Also, if samples are obtained before and after 
AHA treatment, that data from individual animals and 
treatment groups can be subtracted. In addition, the data 
for all control samples can be pooled and averaged. The 

30 control average can be subtracted from averages of both 
post-study AHA and post-study non-AHA cDNA samples. If 
pre- and post-treatment samples are available, pre- and 
post-treatment samples can be compared individually (or 
electronically averaged) and subtracted. 

35 These subtraction tables are used in two general ways. 

First, the differences, are analyzed for gene transcripts 
which are associated with continuing hepatic deterioration 
or healing. The subtraction tables are tools to isolate 
the effects of the drug treatment from the underlying basic 

40 pathology of hepatitis. Because hepatitis affects many 
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parameters, additional liver toxicity has been difficult to 
defect with only blood tests for the usual_enzyn.es. The 
gene transcript profile and subtraction provides a much 
more complex biochemical picture which researchers have 
5 needed to analyze such difficult problems. 

Second, the subtraction tables provide a tool for 
identifying clinical markers, individual proteins or other 
biochemical determinants which are used to predict and/or 
evaluate a clinical endpoint, such as disease, improvement 
10 due to the drug, and even additional pathology due to the 
drug. The subtraction tables specifically highlight genes 
which are turned on or off. Thus, the subtraction tables 
provide a first screen for a set of gene transcript 
candidates for use as clinical markers. Subsequently 
electronic subtractions of additional cell and tissue 
libraries reveal which of the potential markers are in fact 
found in different cell and tissue libraries. Candidate 
gene transcripts found in additional libraries are removed 
from the set of potential clinical markers. Then, tests of 
blood or other relevant samples which are known to lack and 
have the relevant condition are compared to validate the 
selection of the clinical marker. m this method, the 
particular physiologic function of the protein transcript 
need not be determined to qualify the gene transcript as a 
25 clinical marker. 

6 * 10 ' ELECTRON IC NORTHERN BLOT 

One limitation of electronic subtraction is that it is 
difficult to compare more than a pair of images at once 
Once particular individual gene products are identif ied'as 
30 relevant to further study (via electronic subtraction or 
other methods, , it is useful to study the expression of 
single genes in a multitude of different tissues. m the 
lab, the technique of "Northern" blot hybridization is used 
for this purpose, m this technique, a single cDNA, or a 
35 probe corresponding thereto, is labeled and then hybridized 
against a blot containing rna samples prepared from a 
multitude of tissues or cell types. Upon autoradiography 
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the pattern of expression of that particular gene, one at a 
time, can be quantitated in all the included samples. 

In contrast, a further embodiment of this invention is. 
the computerized form of this process, termed here 
5 "electronic northern blot." In this variation, a single 
gene is queried for expression against a multitude of 
prepared and sequenced libraries present within the 
database. In this way, the pattern of expression of any 
single candidate gene can be examined instantaneously and 

10 effortlessly. More candidate genes can thus be scanned, 
leading to more frequent and fruitfully relevant 
discoveries. The computer program included as Table 5 
includes a program for performing this function, and Table 
6 is a partial listing of entries of the database used in 

15 the electronic northern blot analysis. 

6.11. PHASE I CLINICAL TRIALS 
Based on the establishment of safety and effectiveness 
in the above animal tests, Phase I clinical tests are 
undertaken. Normal patients are subjected to the usual 

20 preliminary clinical laboratory tests. in addition, 
appropriate specimens are taken and subjected to gene 
transcript analysis. Additional patient specimens are 
taken at predetermined intervals during the test. The 
specimens are subjected to gene transcript analysis as 

25 described above. In addition, the gene transcript changes 
noted in the earlier rat toxicity study are carefully 
evaluated as clinical markers in the followed patients. 
Changes in the gene transcript analyses are evaluated as 
indicators of toxicity by correlation with clinical signs 

30 and symptoms and other laboratory results. In addition, 
subtraction is performed on individual patient specimens 
and on averaged patient specimens. The subtraction 
analysis highlights any toxicological changes in the 
treated patients. This is a highly refined determinant of 

35 toxicity. The subtraction method also annotates clinical 
markers. Further subgroups can be analyzed by subtraction 
analysis, including, for example, l) segregation by 
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occurrence and type of adverse effect; and 2) segregation 
— - - by Tios'age;* '•* 

6 »12. GENE TRA NSCRIPT IMAGING ANALYSIS IN CLINICAL STUDIES 
A gene transcript imaging analysis (or multiple gene 
5 transcript imaging analyses) is a useful tool in other 
clinical studies. For example, the differences in gene 
transcript imaging analyses before and after treatment can 
be assessed for patients on placebo and drug treatment. 
This method also effectively screens for clinical markers 
10 to follow in clinical use of the drug. 

6*13. COMPARATIVE GENE TRANSCRIPT ANALYSIS BETWEEN SPECIES 

The subtraction method can be used to screen cDNA 
libraries from diverse sources. For example, the same cell 
types from different species can be compared by gene 
15 transcript analysis to screen for specific differences, 
such as in detoxification enzyme systems. Such testing 
aids in the selection and validation of an animal model for 
the commercial purpose of drug screening or toxicological 
testing of drugs intended for human or animal use. When 
20 the comparison between animals of different species is 

shown in columns for each species, we refer to this as an 
interspecies comparison, or zoo blot. 

Embodiments of this invention may employ databases 
such as those written using the FoxBASE programming 
25 language commercially available from Microsoft Corporation. 
Other embodiments of the invention employ other databases, 
such as a random peptide database, a polymer database, a 
synthetic oligomer database, or a oligonucleotide database 
of the type described in U.S. Patent 5,270,170, issued 
30 December 14, 1993 to Cull, et al., PCT International 

Application Publication No. wo 9322684, published November 
11, 1993, PCT International Application Publication No. WO 
9306121, published April 1, 1993, or PCT International 
Application Publication No. WO 9119818, published December 
35 26, 1991. These four references (whose text is 

incorporated herein by reference) include teaching which 
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may be applied in implementing such other embodiments of 

the present invention. 

All references referred to in the preceding text are 
hereby expressly incorporated by reference herein, 
5 Various modifications and variations of the described 

method and system of the invention will be apparent to 
those skilled in the art without departing from the scope 
and spirit of the invention. Although the invention has 
been described in connection with specific preferred 
10 embodiments, it should be understood that the invention as 
claimed should not be unduly limited to such specific 
embodiments . 
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TABLE 2 



Clone numbers 15000 through 20000 

Libraries: HUVEC 

Arranged by ABUNDANCE 

Total clones analyzed: 5000 

319 genes, for a total of 1713 Clones 



1 
2 
3 
4 
5 
6 
7 
8 
9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 

21 

22 

23 

24 

25 

26 

27 

28 

29 

30 

31 

32 

33 

34 

35 

36 

37 

38 

39 

40 

41 

42 

43 

44 

45 

46 

47 

48 

49 

50 

51 

52 



number 

15365 

15004 

15638 

15390 

15193 

15220 

15280 

15583 

15662 

15026 

15279 

15027 

15033 

15198 

15809 

15221 

15263 

15290 

15350 

15030 

15234 

15459 

15353 

15378 

15255 

15401 

15425 

18212 

18216 

15189 

15031 

15306 

15621 

15789 

16578 

16632 

18314 

15367 

15415 

15633 

15813 

18210 

18233 

18996 

15088 

15714 

15720 

15863 

16121 

18252 

15351 

15370 



N 

67 

65 

63 

50 

47 

47 

47 

33 

31 

29 

24 

23 

20 

20 

20 

19 

19 

19 

18 

17 

17 

16 

15 

15 

14 

14 

14 

14 

14 

13 

12 

12 

12 

11 

11 

11 

11 

10 

10 

10 

10 

10 

10 

10 
9 
9 
9 
9 
9 
9 
8 
8 



entry 

HSRPL41 

NCY015004 

NCY015638 

NCY015390 

HSFIB1 

RRRPL9 

NCY015280 

M62060 

HSACTCGR 

NCY015026 

HSEF1AR 

NCY015027 

NCY015033 

NCY015198 

HS COLLI 

NCY015221 

NCY015263 

NCY015290 

NCY015350 

NCYO15030 

NCY015234 

NCY015459 

NCY015353 

S76965 

HUMTHYB4 

HSLIPCR 

HSPOLYAB 

HUMTHYMA 

HSMRP1 

HS18D 

HUMFKBP 

HSH2AZ 

HUMLEC 

NCY015789 

HSRPS11 

M61984 

NCY018314 

NCY015367 

HSIFNIN1 

HSLDHAR 

CHKNMHCB 

NCY018210 

HSRPII140 

NCY018996 

HUMFERL 

NCY015714 

NCY015720 

NCY015863 

HSET 

NCY018252 

HUMALBP 

NCY015370 



descriptor 

Riboptn L41 

INCYTE 015004 

INCYTE 015638 

INCYTE 015390 

Fibronectin 

Riboptn L9 

INCYTE 015280 

EST HHCH09 (IGR) 

Actin, gamma . 

INCYTE 015026 

Elf 1-alpha 

INCYTE 015027 

INCYTE 015033 

INCYTE 015198 

Collagenase 

INCYTE 015221 

INCYTE 015263 

INCYTE 015290 

INCYTE 015350 

INCYTE 015030 

INCYTE 015234 

INCYTE 015459 

INCYTE 015353 

Ptn kinase inhib 

Thymosin beta-4 

Lipocortin I 

Poly-A bp 

Thymosin, alpha 

Motility relat ptn; MRP-l;CD-9 

Interferon indue ptn 1-8D 

FK506 bp 

Histone H2A 

Lectin, B-galbp, 14kDa 
INCYTE 015789 
Riboptn Sll 
EST HHCA13 (IGR) 
INCYTE 018314 
INCYTE 015367 
interferon indue mRNA 
Lactate dehydrogenase 
C Myosin heavy chain B 
INCYTE 018210 
RNA polymerase II 
INCYTE 018996 
Ferritin, light chain 
INCYTE 015714 
INCYTE 015720 
INCYTE 015863 
Endothelin 
INCYTE 018252 
Lipid bp, adipocyte 
INCYTE 015370 



4 3 



WO 95/20681 



PCT/US95/01160 



TABLE 2 CdnA: 



53 

54 

55 

56 

57 

58 

59 

60 

61 

62 

63 

64 

65 

66 

67 

68 

69 

70 

71 

72 

73 

74 

75 

76 

77 

78 

79 

80 

81 

82 

83 

84 

85 

86 

87 

88 

89 

90 

91 

92 

93 

94 

95 

96 

97 

98 

99 

100 



number 

15670 
15795 
16245 
18262 
18321 
15126 
15133 
15245 
15288 
15294 
15442 
15485 
16646 
18003 

15032 

15267 

15295 

15458 
15832 

15928 

16598 

18218 

18499 

18963 

18997 

15432 

15475 

15721 

15865 

16270 

16886 

18500 

18503 

19672 

15086 

15113 

15242 

15249 

15377 

15407 

15473 

15588 

15684 

15782 

15916 

15930 
16108 
16133 



N 

8 

8 

8 

8 

8 

7 

7 

7 

7 

7 

7 

7 

7 

7 

6 

6 

6 

6 

6 

6 

6 

6 

6 

6 

6 

5 

5 

5 

5 

5 

5 

5 

5 

5 

4 

4 

4 

4 

4 

4 

4 

4 

4 

4 

4 

4 

4 

4 



entry 

BTCIASHI 

NCY015795 

NCY016245 

NCY018262 

HSRPL17 

XLRPL1BRF 

HSAC07 

NCY015245 

NCY015288 

HSGAPDR 

HUMLAMB 

HSNGMRNA 

NCY016646 

HUMPAIA 

HUMUB 

HSRPS8 

NCY015295 

RNRPS10R 

RSGALEM 

HUMAPOJ 

HUMTBBM40 

NCY018218 

HSP27 

NCY018963 

NCY018997 

H SAG ALAR 

NCY015475 

NCY015721 

NCY015865 

NCY016270 

NCY016886 

NCY018500 

NCY018503 

RRRPL34 

XLRPL1AR 

HUMIFNWRS 

NCY015242 

NCY015249 

NCY015377 

NCY015407 

NCY015473 

HSRPS12 

HSEF1G 

NCY015782 

HSRPS18 

NCY015930 

NCY016108 

NCY016133 



R 
R 



R 
F 



descriptor 

NADH-ubiq oxidoreductase 

INCYTE 015795 

INCYTE 016245 

INCYTE 018262 

Riboptn L17 

Riboptn LI 

Act in, beta 

INCYTE 015245 

INCYTE 015288 

G-3-PD 

Laminin receptor, 54kDa 
Uracil DNA glycosylase 
INCYTE 016646 
Plsmnogen activ gene 
Ubiguitin 
Riboptn S8 
INCYTE 015295 
Riboptn S10 

UDP-galactose epimerase 
Apolipoptn J 
Tubulin, beta 
INCYTE 018218 
Hydrophobic ptn p27 
INCYTE 018963 
INCYTE 018997 
Galactosidase A, alpha 
INCYTE 015475 
015721 
015865 
016270 
016886 
018500 
018503 



INCYTE 
INCYTE 
INCYTE 
INCYTE 
INCYTE 
INCYTE 
Riboptn L34 
Riboptn Lla 
tRNA synthetase, 
INCYTE 015242 
INCYTE 015249 
INCYTE 015377 
INCYTE 015407 
INCYTE 015473 
Riboptn S12 
Elf 1 -gamma 
INCYTE 015782 
Riboptn SI 8 
INCYTE 015930 
INCYTE 016108 
INCYTE 016133 
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TABLE 4 



Libraries: THP-1 
Subtracting: HMC 
Sorted by ABUNDANCE 
Total clones analyzed; 



7375 



1057 genes, for a total of 2151 clones 
number entry s descriptor 



10022 

10036 

10089 

10060 

10003 

10689 

11050 

10937 

10176 

10886 

10186 

10967 

11353 

10298 

10215 

10276 

10488 

11138 

10037 

10840 

10672 

12837 

10001 

10005 

10294 

10297 

10403 

10699 

10966 

12092 

12549 

10691 

12106 

10194 

10479 

10031 

10203 

10288 

10372 

10471 

10484 

10859 

10890 

11511 

11868 

12820 

10133 

10516 

11063 

11140 

10788 

10033 

10035 

10084 

10236 

10383 



entry 

HUMIL1 IL 1-beta 

HSMDNCF IL-8 

HSLAG1CDN Lymphocyte activ gene 

HUMTCSM RANTES 

HUMMIP1A MIP-1 

HSOP Osteopontin 

NCY011050 INCYTE 011050 

HSTNPR TNP-alpha 

HSSOD Superoxide dismutase 

HSCDW40 B-cell activ, NGF-relat 

HUMAPR Early resp PMA-induc 

HUMGDN PN-1, glial-deriv 

NCY011353 INCYTE 011353 

NCY010298 INCYTE 010298 

HUM 4 COLA Collagenase, type IV 

NCY010276 INCYTE 010276 

NCY010488 INCYTE 010488 

NCY011138 INCYTE 011138 

HUMCAPPRO Adenylate cyclase 

HUMADCY Adenylate cyclase 

HSCD44E Cell adhesion glptn 

HUMCYCLOX Cyclooxygenase-2 

NCY010001 INCYTE 010001 

NCY010005 INCYTE 010005 

NCY010294 INCYTE 010294 

NCY010297 INCYTE 010297 

NCY010403 INCYTE 010403 

NCY010699 INCYTE 010699 

NCY010966 INCYTE 010966 

NCY012092 INCYTE 012092 

HSRHOB Oncogene rho 

HUMARF1BA ADP-ribosylation fctr 

HSADSS Adenylosuccinate synthetase 

HSCATHL Cathepsin L 

CLMCYCA I Cyclin A 

NCY010031 INCYTE 010031 

NCY010203 INCYTE 010203 

NCY010288 INCYTE 010288 

NCY010372 INCYTE 010372 

NCY010471 INCYTE 010471 

NCY010484 INCYTE 010484 

NCY010859 INCYTE 010859 

NCY010890 INCYTE 010890 

NCY011511 INCYTE 011511 

NCY011868 INCYTE 011868 

NCY012820 INCYTE 012820 

HSI1RAP IL-1 antagonist 

HUMP 2 A Phosphatase, regul 2A 

HUMB94 TNF- indue response 

HSHB15RNA HB15 gene; new Ig 

NCY001713 INCYTE 001713 

NCY010033 INCYTE 010033 

NCY010035 INCYTE 010035 

NCY010084 INCYTE 010084 

NCY010236 INCYTE 010236 

NCY010383 INCYTE 010383 
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TABLE 4 Cori'f 



number 


entry 


s descriptor 


10450 


NCY010450 


INCYTE 


010450 


10470 


NCYOI 04 70 


INCYTE 


010470 


10504 


NCY0105O4 

Hvl w^U JU4 


INCYTE 


010504 


10507 


NCY010507 


INCYTE 


010507 


10598 


NCYOI ORQB 


INCYTE 


010598 


10779 


NCYOI 077Q 


INCYTE 


010779 


10909 


NCYOI OQOQ 


INCYTE 


010909 


10976 


ill* A / O 


INCYTE 


010976 


10985 




INCYTE 


010985 


11052 


NCYOI 1 n<^9 


INCYTE 


011052 


11068 


NCYOI lOfift 

ii w A U X A, w O O 
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011068 


11134 


NCYOI 1 1 3d 


INCYTE 


011134 


11136 


NCY01113A 

*" X W X X X W 


INCYTE 


011136 


11191 


NCYOI 3 1 Q1 

"wl w X X X 7 X 


INCYTE 


011191 


11219 


NCY011219 


INCYTE 


011219 


11386 


NCY011386 


INCYTE 


011386 


11403 


NCY011403 


INCYTE 


011403 


11460 


NCY011460 


INCYTE 


011460 


11618 


NCY011618 


INCYTE 


011618 


11686 


NCY011686 


INCYTE 


011686 


12021 


NCY012021 


INCYTE 


012021 


12025 


NCY012025 


INCYTE 


012025 


12320 


NCY012320 


INCYTE 


012320 


12330 


NCY012330 


INCYTE 


012330 


12853 


NCY012853 


INCYTE 


012853 


14386 


NCY014386 


INCYTE 


014386 


14391 


NCY014391 


INCYTE 


014391 
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TABLE S 



* Master menu for SUBTRACTION output 

SET TA LK O F? 

SET SAFETY OFF 

SET EXACT ON 

SET CTFEAHEAD TO 0 

CLEAR 1 

SET DEVICE TO SCREEN 

USB-"SmartGuy!PcncBASS+/Mac:fox f ilea i Clones. dbf 
QO TOP ' 

STORE NOMBBR TO INITIATE 
CO BOTTOM 

STOR E NUMBER TO 'TERMDIATE 
STORg 1 ' TO Targe tl 

STORE 1 1 TO Taxget2 

STORK ' 'TO Target3 

STORE. 1 ■ to Objectl 

STORE ■ ' 1 TO Object2 

STORE 1 • TO Object 3 

STORE 0 TO ANAL ' 
STORE 0 TO EMATCH 
ggR E 0 TO HMATCH 
STORE 0 TO (HATCH 
. STORE 0 TO IMATCH 
fflUHK 0 TO JTP 
S'il/Htfl TO SAIL 
SO WHILE .7. 

* "Program, i -Subtraction 2,&tt 
Sat©..,, i 10/U/94 

• * Version, i Fo*BASE+/Mac, revision 1.10 

* Notes, . . . t Format file Subtraction 2 



HEADING -Screen l f AT 40,2 SIZE 286,492 PIXELS FONT -Geneva', 9 COLOR 0.0 0 
8 PIXELS 75,120 TO 178,241 ST*L3 3871 COLOR 0,0,-1,24610.-1,6947 ' ' ' 




6 PIXELS -198,126 GET PTF STYLE 65536 FONT 'Chicago' , 12 PICTURE "8*C .Print to file- SIZE 15" 9 
8 PIXELS 90,9 TO 1Q1,109 STXIfl 3871 COLOR 0,0,-17-25600,-1,-1 ^ reinc co SIZE 15,9 

0 PIXELS 90,28'B TO'181,397 STlfLE 3871 COLOR 0,0,-1,-25600,-1,-1 




8 PIXELS 108,299 GET Objectl STYLE 0 FONT 'Geneva', 9 SIZE 12.79-COLGR 6,6,-1.-1 -1 -1 
8 FimS X3S.299 GET object2 STYLE 0. FONT 'Ganeva',9 SIZE 12 79 COLOR . . 1-1 
8 PIXELS 162,299 GET 6bject3 OTLE 0 FWT ■Genevavg SIZE 12^79 COLOR 0 0 -1 -1 -1 Zl 
'8 PIXELS 276,324'GET Bail ST¥I£ 65536 FONT -Chicago-^ PICTURE '8*R Run,Sail oif'siZE 4112 

* EOFi Subtraction. 2. fmt 
READ 

IF Bail»2 

CLEAR 

CLOSE DATABASES 

DSE ■ Smart Guy : FoxBASB+ /Mac : fox f iles; clones. dbf 
•SET SAFETT ON 
S CREEN ,! OFF 
RETURN 
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bidz? 

STOfcB VRL (SYS ( 3 ) ) -tO- StARTXME - -* 

STORE TOPER (Targetl), 10 Targetl 

STORE. UPPER (Target 2) TO Target2 — 

STORE UPPER (Targe t3) TO Target3 

STORE UPPER(Objectl) Tp QbjecCl* 

STORE UPPER (Object 2) TO0bject2 

STORE UPPER(Cto3ect3) TO Object3 

clear 

SET T$UC cw 
GAP = 

GO INITIATE 

FIELDS NUM3ER, library, D, F> 2, R,Bn^,S, DESCRIPTOR, START, RFEND, I TO TEMPNUM 

COUNT TO TOT 

^JHpJEP^ K)R Oa, E , .0R.D=»O».0R.D=»H , ;0R.t)='N , .0R.D»'I' 
USE TEMPRED 

IP ftoatctoaO .ion?. &nacch=0 .AND. cmatchoO .AND. ZMMCH=6 

COPY- TO. TEMPDESIG 

ELSE 

OOPY STRUCTORE TO TEMPDBSIG 
USE TEMFDESIQ 
I F Bn atch»l 

APPEND FROM TEWEKUM FOR fe'S 1 
Q3D2F 

I P'Hm atch^l 

APPEND FROM TEMPNUM FOR D^'H' 
INDIF 

I P Cto atcbsl 

APPEND FR&^'TEKPNUM FOR Ds'O 1 

aroiF 

IF Uratchsl 

APPHTO FROM TEMPNUM FOR D= 'I» .OR.Ds'X 1 
# . > OR.Do l N l 

END IF 
ENDI F 

COUNT TO SXARTOT 

COPY STRUCTURE TO TEMPLIB 
-USE TEHELIB ... 

append from tqipdesiq FOR library*uppER (targetl > 
. IP targe b2o 1 » 

APPEND^ FROM TEHPDESXG FOR litaary^UPPER (target 2 ) 
ENDIF , 

XF target3<v» » . 

APPEND FROM. TIMPDESIG FOR lihrary.UPPER (target3 ) 
ENDI F 
COUNT TO ANAI/TOT' 

USE TgMPD ZSIG 

COPY STOUOXURE TO TEMPSUB 

USE TEMBSUB 

APPEND FROM TEMPDBSIG FOR libriLry=UPPER (Ob j ec tl ) 
IP fca^pgefczo' • 

APPEMD FROM TEMPDESIG FOR lihra'rynUPPER (Object 2) 
ERDIF 

IF target3o' 

•^^D FROM TEMPDESIG FOR lihrary=UPPER(Ci?j ect3 ) 

COUNT TO SOBIRACTOT 
SST TALK OFF 

* COMPRESSION SUBROUTINE A •••••*•*•*•••••••***♦♦< 

? 'COMPRESSING* QUERY LIBRARY 1 
USE TEMPLIB 
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SORT ON 'ENTRY, NUMBER 70 LZ830RT 

USE LIBSORT 

COUNT TO IDGENE 

REPLACE KUj RFEND WITH 1 

MXBT1 o 1 

5W2cO 

DO WHILE SW2-0 ROLL 
IP KARK1 >= ZDGSNB 
PACK 

COUNT TO ADNXQUE 

LOOP 

ENDIF 
GO HARR1 
EUPr 1 

6TORS ENTRY TO TESTA 
STORE D TO DSSIGA . 
SW » 0 

DO WHILE SWbO TEST 
SKIP 

STORE ENTRY TO TESTS 
STORE D TO CSSIGB 

IF TESTA = TESTS .AND. DE6IGA*DESI<3B 

DELETE 

COP s EUP+1 

LOOP 

S032F 
GO'MARKl 

REPLACE RFEND WTE CUP 
MARK1 - HARXl+lttP 
SW=1 
LOOP 

ENDDO.TEST 
LOOP 

2NDD0 ROLL 

SORT CN RFB^D/D, NUMBER TO T^IPtfARSORT . 
USE TEKPIARSORT 

♦REPLACE ALL START WITH RF£MD/XDGENE*10000 
COUNT TOTEMPT3ARC0 

♦ CCKPRtSSICN SUBROUTINE B 
? 'CCMEKESS2W3 TARGET LIBRARY 
USE .TEMPSUB 

SORT CN ENTRY, NUMBER TO'SUBSORT 
USE SUBSORT 
COUNT TO SUB3ENE 
REPLACE ALL RFIND WITH 1 
MARK1 b 1 
6W3-0 . 

DO WHILE SW2=0 ROLL 
IF KAH& >« SUBGENE 
PACK . 

COUNT TO SUNIQUE 

SW2sl 

LOOP 

ENDIF 
GO KARK1 • 
DOT - 1 

STORE, ENTRY* TO TESTA 
STORE D TO DESIQA 
SW o 0 

DO WHILE SWbO TEST 
SKIP 

STORE ENTRY TO. TESTB 
STORE D TO DBSIGB 

IF TESTA = TESTS .AND. D£SIGA?DESXGB 
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SSXJRB 

DUF - DUP+1 

— " LOOP ' - ' " - 

2NDZ? . 
GO HARX1 

REPLACE RFEND WITH DOT 
MARK1 » MARK1+DUP 
SWbl 

LOOP 

ENDDO TEST 
LOOP ; 
ENDDO ROIL 

SOR T GN RFHND/D,NUKBER TO TEMPSUBSOR? 
.•TJSE TEMPSUBSORT 

•REPLACE ALL START WISH RFEND/IDGB$E*10000 
COUNT TO TEMPSUECO 

♦FUSION ROUTINE *" 

? 's uatWk Croro libraries 1 

OSE SUBTRACTION 
COP? STRUCTURE TO CRUNCHER 
SELECT 2 
USB T&SPSUBSQRT 
SELECT 1 
USB CRUNCHER 
APPEND FROM TENPTARSORT 
COUNT TO BAILOUT 
HARK e 0 

SO KHILS ,T.. 

MARK cr MARK+1 
IF MAR1OBAIL0UT 
EXIT 
ENDIP 
•GO MARK 

STORE ' ENTOf TO SCANNER 
SSLSCT 2 

LOCATE. FOR aflOTstSCAKNER 
IP POUND () 
STORE RFEND TO BIT1 
STORE R£E2JD TO BZT2 
ELSE 1 

STORE 1/2 TO Bin 
BTQRg 0 TO BUS 
ENDIF 

REPLACE BGFRBO WITH BIT2 
REPLACE ACTUAL WITH BXT1 
LOOP 
BODO 

FFIF TT 1 

REPLACE ALL RATIO WITH REEND/ACTOAL 
? 'DOING PINAL SORT BY RATIO* 
•?S*?J**< RATIO/D, BGFREQ/D, DESCRIPTOR TO FINAL 
USE FINAL 

■et balk off *•••••«•••••••»«•***#**»♦*♦« 

DO CASE, 
CASE PTPsO" * 
SET DEVICE TO PRINT 
S ET PR INT ON 

CASE PTP=1 

SET ALTERNATE TO -Adenoid .Patent Figures: Subtraction, txf 



5 1 



WO 95/20681 



PCT7US95/01160 



SETALTERNAT5 CN 
JNDGAgE^ _ _ 

STORE VAL{SYS(2) )' TO FINTCME 

IF FIN TIK&cSTAiynMS 

STORE FIHITOE+86400 TO .FHftSttE 

B2DXP 

■ STORg FIUTIMB - STARSTME.TO COMPSEC 
STORE CGMPSEC/60 TO COKFMZN 



SET MARGIN TO 10 

01,1 SAY •Library Subtraction Analysis" STKLB 65536 FONT - Gen8va\274 COLOR 0,0,0,-1,-1, 

7 
7 
7 

? date (J 
?? • ' ' 
?? TIME (J 

7 1 Clone nuribefs 1 

77:€TR(rNTPIATE f 5,0) 

.??.' through ' • * 

?? STR(TIWOTAT5, 6,0} 

7 'Ubrarieai 1 

7 Targctl 

Z? Targe tSo 1 

77. V 

7? TargeW 

ENDS? 

IF TargetSo ' 
?? ', 1 • 
77T&rget3 

7 'Subtracting; 
7 Qbjectl 
IF-Objec^o' 
??■•',. 1 
77 Object* 

EMDZP 

IF Qbject3<>» 
?? ' , 1 
77 0bject3 
ENDIF . 

•7 1 Designations r .' 

IF Ematch«0 .AND. Hmatch=0 .AND. Cnatch=0* .AND. IKATCK=0 

77 'All 1 

ENDIF 

IF Bnatch»l 
?? 'Exact, • 

ENDXF 

IF Hmatchrl 
77 'Human, 1 
ENDC 

'IF cmatehal 
7? 'Other ep. ? 
ENDIF 

IF Imatch«»l 
7.7 'XKOTE 1 
HJDIF 
.•IF AMALsl 

7 'Sorted by ABUNDANCE'- 

HOT?. 

IF ANAL-2 

7 'Arranged by FUNCTION" 
EHDX? 
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? '?otal clones represented! • 

?? «TR<TOT,5,0) . ' A 

7 ■Total clones analyzed: 1 

?? ffm(SXftR!IOr,5,0) — 
? 'Total, captation, tijrwi 
*?? STR{C0MWCN,5,2J " 
?? 1 oinutoo' ' 
? ' 

V'.'d.dchioaatlan f . distribution dotation, r * function . « species i = inte 

R 1 ™ ° iSOreeC l " A * < 0 ' 2 MM» PIXELS FWT W l9 COLOR 0,0.0, 

CASS ANALal 

?? STR(AUNIQUE ( 4,0) 

?? 1 genes, for a total of 1 • 

.?? 6ratANAMOT # A,0> * 

?? 1 clones' 

CLOSE DATABASES 

•tJSB/fiiraxtGuyrF0XBASE+/M£ie:£c»c files: clones, dbf 

CASE.ANAIrf 
* • arrange/function 
SEP PRINT CN 
SET HEADING CN 

SCREEN 1 «n 0 HBfiDDJS 'Screen IV AT 40,2 SIZB 286,492 PIXELS • FCHT .Helvetica', 2 6 B' COLOR 0 

1 binding proteins 1 

.r^r^LSet^t^;> T 4M Sm 286 ' 4M «*" » -«elvetica.. 2 6S COLOR 0 
SCREEN 1 TSfEE 0 HEADING ■Screen 1" AT 40 2 opr o-vwe. ' • . 

list OFF fields »fc.DAlXw«S^ ?.».P. 

'f^uS2*Lin?S^ enV ^ «°' 2 8123 - 28 . 6 « 492 "™ •** .'Helvetica.,265 color 0 

f'SiidT^^fl^;?^ 11 » «°' 2 "28. 266,492 PIXELS TOOT .Helvetica. ,265 COLOR 0 

SSL 0 ^^^^^^ 

S ^VSSSLs??^ *' W «°' 2 SI2B 286 '" 2 «« «*» helvetica., 265 COLOR 0 
SCREEN1 TYPE <> HEADING 'Screen 1' AT'40,2 SIZE 286,492 PIXELS FONT 'Geneva. ? M ,« „ « « 
list OFP fields ^^^.D.T.Z^t^r^.TSES^I^.K^Q,^^,^^,^^^^,^ 1 ^ °'. 0 ' 0 ' 

SCREEH i TYPE 0 HEADED*} -Screen IV-JW SIZE 286,492 PIXELS FONT 'Helvetica', 2 68 COLOR 0 
f^iaTSi^ 3 ' SCr9eil 11 * 4 °' 2 SI2E 296 ' 4M «»"--Helvetica.,265 COLOR 0 

f^-iiSi^pSSS,' 60 "" 1 l ' " *°' 2 S1ZZ 286 '" 2 ^vetica',265 COLOR 0 

SCRBElll TYPE 0 HEADING 'Screen 1' AT 40,2 SIZE 286,492 PIXELS FONT 'Geneva' i mrs» n« a 
list OFF fields M^l>,»,*.fcBm,«.nHCWTO °' 0 ' 0 ' 
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f^.l SZZt^ 1 ™ ,SCr9en *' AP . 40 ' 2 SIZE ««• FONT -H lvetica-,265 COLOR 0 

- "SCREOri WPE 0"H2ADXNG ^Screen 1" AT. 40,2 SIZE 205,492 PIXELS Pnwn .rE^Sft 7 ^Of 28L 
list OFF fields W«te.D.F,S,R ( .TO.S.n^^ 



SCREEN 1 TVPE 0 HEADIN3 -Screen 1- AT 40,2 SIZE 38$; 492 PIXELS FCOT *Helvehi M . 3« nr* M n 
? 'Kinases and Phosphatases t* . — »-« cwi Helvetica", 255 OOUQR 0 

SGEE^'TYPE 0 HEADING -Screen 1- AT 40,2 SIZE 286,493 FEELS FOOT -Geneva 7 mr/w n n * 
list OFF.fields auBber.D^Z.R.SH^^ 

f^r^tKtof^ a ' ^ 40,3 *" 8M ' 4M • 'Helvetica'. f 265 COLOR 0 

SCREEN 1 TYPE 0 HEADIKG 'Screen 1* AT 40.2 SIZE 286,492 PIXELS PCaw .r-«n^. t ™t™> 

list OFF fields wnte,p,r.B,R,M^ 0,0,0, 

SCREH* 1 TOTE 0 HEADJKC Screen !■ AT 40,2 SIZE 286,492 PIXELS FOOT 'Helvetia- 2«ft mr^ * 

? 1 PROTEIN SYNTHETIC MACKHJERY- PROTEINS' Helvetica ,268 COttfR 0 
7 _ 

SCFCQ^ 1 TYPE 0 HEADING "Screen 1- AT 40^2 SIZE 286,492 PIXELS FCNT 'Helvetia s« mr^ a 

? 'Transcription and Nucleic Acid-binding proteins i" r * LAfilUa Helvetica* ,265 COLOR 0 

iKSLi "Screen 1' AT 40,2 SIZE 286,492 PIXELS * FCOT -Geneva" 7 rnr™ n n n 

list OFF fields nun^,D,F,Z,R,Eh7TRr,S,DE5CW 

SCREEN 1 TYPE 0 HEADING "Screen 1" AT 40 2 q??? obi; ago dtvct e ' ' ■ 

? 'Translation:'^^ ' 2B6,492 PIXELS FCOT -Helvetica ",265 COLQR.O 

.SSI ^JS^JSAStSiSSSUR^^^ 

SCREEN 1 TYPE 0 HEADING "Screen 1" AT 40.2 SIZE 28S AM vryrrjs rmw .~ ■ „ 

Uat off fields n Wl ^.D;r < z,B,^ s ?^c^?|^^,^ j 0 ^;^ «^ ".0,0, 
fSt^SiSSff ■'■ mea *' ^ "' 2 SIZE " 286 '" 2 «■» -Helvetica. ,-265 COLOR 0 

SCREEN 1 WE 0 HEADIU3 -Screen 1- AT 40.2 SIZE 286,493 PIXELS , FONT 'Helvetic^ 268 COLOR 0 



?°feS^ IMS " SCreen 11 * T 4 °' 2 8IZB a ? 6 '" 2 "» 'Helvetica-^ COLOR 0 

saw ss^ssasa ■•••^ 

f^tLHf Jd^t^- 1 ' 40 * 2 S1ZE 286 '" 2 ? DlELS «» -Helv.tica.,265 COLOR 0 
6CREEN1 TYPE 0 -Screen 1- AT 40,3 SIZE 286,492 PIXELS FOOT •Geneva' 7 corns o n n 

list OTP fields nuniber;0,F,Z,R,Ein!RY,S, DESCRIPTOR, BGFREQ,RSEND,R&^M,I FOR^Rs'P'^^^ ' ' ' 

fS^pS^f^. r * 40 ' 2 SIZS 28S '" 2 ™ "» 'Helvetica-^ COLOR 0 
SCSEEH1 TYPE 0 HERMN6 'Screen V AT 40,2 SIZE 286,492 PIXELS FONT "Geneva- 7 CDWiR ft n ft 
list OFF field, number.D,F,Z,R,EHrRY,S,DESCRIFT^ 

fS« l " M 40,2 SIZE 2B6 '« 2 ™* FONT -Helvetica-,265 COLOR 0 

fiCREENl TYPE 0 HEADING -Screen 1' AT 40,2 SIZE 296,492 PIXELS PONT "Geneva- 7 emm 0 a n 
list off fields number, d,f,z, R, EOTRY.S, descriptor, bgfreq.rfqto, ratio, I for^= 'q 1 °'°' 0 ' 

S a^ LS^" a " A ' 40 ' 2 SIZE 2fl6 '" 2 PKEW ^ ■Kelvetica.,265 COLOR 0 
screen l TYPE 0 HEADING -Screen 1- AT 40,2 SIZS 286,492 PIXELS FOOT "Geneva',7 COLOR O',0,0; 
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u«t off fiews mte,D.*,*,Kmw,vua<m.jan^m^^ W r r-m- 

fSid Sli 8 ^ '""T 11 W "« 3 SHB «M» Pims- FOOT -Helvetica.,** COICR 0 

aw au ;s:?fi^^^ 

F5LJ S-Jar 5180 ' SCree " a " ** «>> S ™ »W02.«M 50NT .Helvetica',265 Ccrx* 0 

r". 1 ^ ° ^ »' ^ «•> «■ PXX^S ^ -Helvetica-.^ COUK 0 

y ' MTSCEEIAMBODB CATEGORIES' 

fCTEBJ 1 OTOE 0 HEADING •Screen 1" AT 40 3 CT7T ioir ^oo „ ' 

? 'Screes response!* • 0,3 5132 28e ' 492 pixels foot •Helvetica', 265 COLOR o 

DO "Teat print -prg" 

SET PRINT OTP 

S2T DEVICE TO SCREEN 

CLOSE DATABASES 

ERASE TEMPLIB.DBF 

BRASS %a*PNUK»OBF 

EPASB T2MPDBSIQ.E8F 

SB? MARGIN TO 0 

CLEAR 

LOOP 

Q2DD0 
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SSfSLiSV^ ' wr8iTO 112594 

- - SET TALK OFF ------- - ' 

SET PJUW OFP ' 

SET EXACT OFF 
CL3AR * 

STORE .j . TO Eobject 

eiSraOTO Kuni TO Dobject 

STORE 0-0O Zog 
STORE 1 TO Bail 
DO WHUtB ,T. 

* Program, i Northern (single) .fine 

* Date...... 8/ 8/S4 ' 

* V«ioft.t .PwiBASBt/Mac/ revision 1.10 

^ Notes.*. v.i .Format file Northern (single J 

SCREEN l TVTPE 0 HEADING •Screen 1- AT 40 2 cttp »r ^ 
6 PIXELS 113 P 98 SAY "EUtry #i» puvtr ««m wwS ,i ««OU # --l,-l 

2 ? 20 ' 162 Bail STOLE 65536 FONT TrJI^I^^T , <S«neva-;274 COLOR 0,0.- 

Wi Northern (single). ft* ' 
READ 

IP Bail=2 
CLEAR . 
screen 1 off 
"RETURN 
EKDIF 

-^ r ^ i ' lF ? XBASE+/Mac,Pcx £il EB . Lookup. dbf- 

CSB •Loolcup entry, dbf • 
LOCATE FOR LookeBobiect 
IP ..NOT.POCZNDO * 
CLEAR 

LOOP 



SJ0R2 Entry TO 6earchval> 

CL0S2 DATABASES 

SHASS ."Lookup entry, dbf • 



•IP Dbbjecto' • 
SET EXACT OFF 
SET SAFETY OFP 



°S.^<m*«P desoriptor.abf • 



0£AR 
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LOOP 

— BROW SE- - v - _ - 
STORE Entry TO Searchval 
CLOSE DATABASES ■ 
ERASE "Lookup descriptor. db£ R 
SET BOCT ON 
ENDXF • 

IF NurcboO 

USE B Smart OuytFoxBASE+ /Mac :Fox files; clones. dbf> 
00 Numb 
BROWSE ' 

.STORE Butty TO Searchval 
OC£AH 

? •Northern analysis for entry ' 
?? Seafdhval 

? . * 

? '&cer Y to proceed 1 

WAIT TO OR * 

^X£AR 

IF.OTPERfCaOo'Y' 
screen 1 off 
H EIUR H 
SDIF 

* COMPRESSION ' SUBROUTINE FOR Library idbf 
7 'Coopreasing the Libraries file now;-.. 1 

TJBE "SmartGuy:FoxBASE+/Mac:Fox files: libraries, dbf' 

SET SAFETY CSV , 

SORT ON library TO 'Can^pressed libraries. dbf ■ 

* FOR ente red>0 * 
SET SAFETY ON 

USE "Coopresaed libraries*, dbf* 

DELETE FOR entereds'O 

PACK 

COUNT TO TOT 
MARK1 a 1 
SW2oO . 

CO WHILE SW2=0 ROLL 
•IF MAW0L >» TOT 
•PACK , * 
SW2=1 
LOOP 
EKDIF 

GD MARXl. 

' STORE library TO TESTA 
•SKIP 

Store Library to testb 

IF TES TA s TESTS 
ENDIF 

MARXl 9 WARK1+1 1 
LOOP ' 
ENDDO ROLL 

* Northern analysis 
CLEAR 

? 'Doing the northern new.., 
SET TALK ON 

USE * SmartGiy t Fox3ASE+ /Mac i Fox filesiclones.dbf " 
SET SAFETSf OFF 

COPV TO 1 H it b. dbf* FOR entxyasearchval 
SET SAFETY CN 
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• MASTER ANALYSIS 3; VERSION 12-9-94 

* Master menu for analysis output 
CLOSE DATABASES 

SET TALK OFF 
SET SAFETY OTP 
CLEAR 

SET DEVICE TO SCREEN 

£5 TO " SmartGuy : FoxBASE+/Mac : fox files: Output prop-amsi- 

^TOP^ FQX3ASE+/MaC: f ° X fi l«s:Clones.dbf» P ^ 

STQgg_NUMBSR TO INITIATE 
GO B OTTOM 

STOR E NUMBER TO TERMINATE 
STORE 0 TO ENTIRE 
STORE 0 TO CQNDENf 
STORE 0 TO ANAL 
STORE 0 TO EMATCH 
STOR E 0 TO HMATCH 
STORE 0 TO OMATCK 
STORE 0 TO IMATCH 
STORE 0 TO XMATCH 
STOR E 0 TO PRINTON 
STORE 0 TO PTF 
DO WHILE .T. 

* Program.: Master analysis. fmt 

* Date....; 12/ 9/34 

* Version, t FoxBASE+/Mac, revision 1.10 

* Notes...,: Format file Master analysis 



6 PIXELS 135,126 GET HMATCH ffTCLE 6S536 FONT -SSSS-'S PlSEP a*n SIZE X5 ' 62 00 



EOF: Master analysis. fint 
READ 

IF ANAL=9 
CLEAR 

CLOSE DATABASES 
ERASE TEMEMASTER.DBF 

USE n SmartGuyiFpxBASE+/Mac:£oK files: clones. dbf 

SET SAFETY ON 

SCREEN 1 OFF 

RETURN 

END IF 
clear 
? INITIATE 
? TERMINATE 
? -CGNDEN 
? ANAL 
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? snatch 
? Hmatch 
? Csnatch 

7 IMATCK 
SET TALK ON 

IP ENTIRES 
USE 'Uhique libraries .'dbf ■ 

REPLACE ALL i WITH 1 ' ' 

BROWSE FIELDS i, lifaaame, library, total , entered AT 0,0 
ENDIF 

USB ■SmartGt2y:PoxBASE+/Mac:fox files t clones. dbf ■ 
^^JEJ^^ TOR *^M3ER>° INITIATE .AND . NUM5ER<=TERMIKATE 

COPY STRUCTURE TO TEMPLIB 
USE TEMPLIB 
IP BSTIREol 

APPHND FROM ■ SnartGuy : Fox3ASS+ /Mac ; fox files : Clones . dbf 
ENDIF 

IP EWTCRE&2 
USE "Unique libraries. dbf • 

COPY TO SELECTED FOR UPFSR(i) = »Y» 
USE SELECTED 

STORE R3CC0UNT() TO STOPIT 
MARXsl 

DO WHILE .T, 

IF MARK>STOPIT 

CLEAR 

EXIT 

ENDIF 

USE SELECTED 
GO MARK 

STORE library TO THISONE 
? 'COPYING 1 
?? THZ8CNS 
USE TEMPLIB 

ffSl 'SvSS? 1 FoxBASE+/Mac : f °* fHes:eioM..db£- FOR libraxy.TOISONE 
LOO? 
33DD0 
ENDIF 

USE 0 Srnar cGuy : FoxBASE+/Mac : fox files : clones, dbf 

CCUOT TO STARTOT 

COPY STRUCTURE TO TEMPDESIG 

USE TEMPDSSIG 

HLH 1 ^^ 00 • AND - HmatehsO .AND. Csratch=0 .AND. IMATCH«0 

APPEND FROM TEMPLIB 

ENDIF 

IF Emacchsi 

APPEND FROM TEMPLIB FOR D='E' 
ENDIF 

IF Hmatchal 

APPEND FROM TEMPLI3 FOR D= , H I 
ENDIF 

IF Gmacchsl 

APPEND FROM TEMPLIB FOR De'O 1 
ENDIF 

IF Imatchsl 

APPEND FROM TEMPLIB FOR D= • I ' .OR.Do 'X' .OR.Da'N' 
ENDIF 

IF Xmatchni 

APPEND FROM TEMPLIB FOR D='X' 

ENDIF 
COUNT TO ANALTQT 
aet talk off 



DO CASE 
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CASE PTP=0 

SBT DEVICE TO PRIOT 

SET PRINT ON 

EJECT 

CASE PTFsl 

ffiT ALTSTO^TE TO -Total function aort.txb- 
S £ /rE * NWrE TO "H and 0 function sort. txf 

AM^aTE TO -Shear Stress HOVEC 2: Abundance sort, txf 
*f£ TO BShear Stress HUVEC 2: Abundance comtkf 

!££ TO " Shear Stress HUVEC 2:Punction sort txt- 

S 2= S Itllll SS SgSfSS^-- 
SRSSIF ■"— stre " 'SSiJl^St.. 

ENDCASE 

XP PR1NIUN=1 

g,30 SAY -Database Subset Analysis- STYLE 65536 FONT -Geneva-, 274 COLOR 0,0,0,-1,-1,-1 
? 

? 
o 

? 

? date{) 
?? * 

?? TIME() 

? 1 Clone- numbers 1 

77 STR ( INITIATE , 6,0) 

77 1 through 1 

?? STR (TERMINATE, 6, 0) 

7 'Libraries: ' 

IP £KTIRE=1 

? 'All libraries' 

ENDIF 

IP ENTIRES 
MARlUl 
DO WHILE ,T. 
IF MARK>STOPIT 
EXIT 
ENDIF 

USE SELECTED 
GO MARX 
7 • » 

77 TRIM(libname) 
STORE MftRK+1 TO MARK 
LOOP 
ENDDO 
ENDIF 

? 'Designations: ' 

IP Ettatch=0 .AND* ttnatch=0 .AND. Ctaatch=0 .AND. IMA1CH=0 
ENDIF 

IF anatch=l 
?? -acact, * 

ENDIF 

IF Hmatchol 

?? 'Human,* 

ENDIF ' 

IF Ctaatchsl 

77 'Other sp. ■ 

ENDIF 

IF Imatchal 
7? 'INCITE' 
ENDIF 

IF Xrratch=l 
77 'EST 1 

60 
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ENDXF 

IF CCNDENal 

? 'Condensed format analysis' 

ENDXF 

IF ANALel 

? 'Sorted by NUMBER 1 

BNDXF 

IF m=2 

? 'Sorted fcy ENTRY' 

EtflDOCF 

IF ANBUj=3 

? 'Arranged by ABUNDANCE' 

EMDIF 

IF ANAU4 

? 'Sorted fcy INTEREST' 

SHOT 

IF ANAU5 

? 'Arranged fcy LOCATION ' 
ENDXF ' 
IF ANAL=6 

? 'Arranged by DISTRIBUTION ' 

EHDIF 

IF ANAL=7 

? 'Arranged by FUNCTION' 
ENDZF 

? 'Total clones represented: ' 

?? STR(STARTOT,$,0) 

? 'Total clones analyzed: 1 

?? STR(ANAIJIOT,6,0)' 

? 

7 '1 = library d = designation f = distribution z = location r = function c«:cer 
USE TEMPDE3IG 

S^S 1 ™ ° mm ° Screei1 11 AT 40 ' 2 286,492 PIXELS FOOT -Geneva-*,? COLOR 0,0,0, 

CASS AKAIjsl 

* sort/number 
SET HEADING W 
IF COENDENal 

SORT TO TEMPI ON ENTRY, NUMBER 
DO "COMPRESSION number. ?RG' 
ELSE 

SORT TO TEMPI CN NUM33R 
USE TEMPI 

list off fields number, L, D, F, Z, R, C, ENTRY, S , DESCRIPTOR 

™» 2!L£*«? ******* < L/D,F,Z,R,C, EHTRY, S , DESCRIPTOR , LENGTH, RFEND, INIT, I 
CLOSS DATABASES 
ERASE TEMPI. DBF 
QJDIF 

CASE AMAL=2 

* eort/DESCRIPTOR 
SOT HEADING ON 

*SORT TO TEMPI ON DESCRIPTOR, ENTRY, NUMBER/ S for D»'S' .OR.D= 'H' .OR.D= '0' .OR.D*'X' OR D='I' 
l^JPJ^F 1 W DESCRIPTOR , NUMBER / S for D=»E' .OR.D= 'H' '.OR.D- 'O' .OR.D='X' .OR.D* ' I • 

SORT TO TEMPI ON ENTRY, START/ S for D='E' .OR.Ds'H' .OR.D='0' .OR.D=*X» OR O-'I' 
IF CQNDtN=l 

DO "COMPRESSION entry. PRO* 
USE TEMPI 

2" f 1 !^ number,L,D,F,Z,R,C,2H7RY,S, DESCRIPTOR, LENGTH, RFEND, INIT, I 
CLOSE DATABASES 
ERASE TEMPI. DBF 
ENDUP 
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CASS ANAL=3 

* sort by abundance 
SET HEADING ON 

SOOT TO TEMPI ON ENPIRY, NUMBER for D='E' .CR.D.'H' ,OR.D='0' .OR.D* 1 *' ,OR.D«'I' 

DO "Compression abundance. erg ■ 

CASE ANAL-4 

* sort/interest 
SET HEADING W 
IF C0NDEN=1 

SORT TO TEMPI ON ENTRY, NUMBER FOR I>0 
DO 'COMPRESSION interest. PRO- 



SORT CN I/D,EOTRY TO TEMPI FOR I>1 
USB TEMPI 

number ,L, D,F,Z, R, C, ENTRY, S, DESCRIPTOR, LENGTH, RFEND, INIT, I 

close Databases 
erase tempi. dbf 

ENDIF 

CASH AKAL=5 
* arrange/location 
SET HEADING ON 
STORS 4 TO AMPLIFIED 
? 'Nuclear:' 

S R oSn2 RY/NUMBSR FIELDS *^'^^< L ' D ' F ' Z ' R ' C '^ 

DO "Conpression location. prg" 
ELSE 

DO •Normal subroutine 1" 
ENDIF 

? •Cytoplasmic: 1 

DO "Compression location. prg" 
DO "Normal subroutine 1' 

ENDIF 

? 'Cyc'oskelecon: 1 

gW^MMHY^NUMBER FIELDS RFEND,NU^ER, L^^F/Z^/C* E?niY / S,rasCRIPTOR,IjENCTK, INIT, I,CQMMEH 
DO "Compression location. prg" 

DO •Normal subroutine l" 
ENDIF 

? 'Cell surface: « 

S R Oo£Sl RY ' NUM3ER FIELDS ^^®'^ u ^^ j ^' D*F» 2, R,C,HTIRY,S, DESCRIPTOR, LEfrXSTO, INIT, I, COWMEN 
DO "Compression location, prg" 

DO -Nonnal subroutine 1* 
£NDZF 

? 'Intracellular membrane: 1 

IF 5 CcSS? RY ' K,UMBER FIELDS ' ^^2R , L, D #F, 2 /R,C*D7TRY,S, DESCRIPTOR, LENGTH, INIT, 1 1 COMMEN 

DO "Compression location. prg* 
ELSE 

DO "Normal subroutine l n 
ENDIF 

? 'Mitochondrial:* 

S I a^a^ RY/NUMBER FIELDS K*^'^^'^ 0 '^ 2 '*' 0 ' 2 ^ 

DO "Compression location. prg* 
ELSE. 

DO "Normal subroutine 1" 
ENDIF 
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? 1 Secreted i 1 

?? J coKT RY,NUMBEH ?IELDS 1 ™'*»^ 0 '"^ 

DO "Coirpfession location.prg" 
ELSE 

DO "Normal subroutine 1* 

EKDIF 

? 'Otheri* 

§ R ?c£F y ' NU ^ FIELDS N ^<™^' L '»**>z>*>^ 

DO "Compression location. pro" 

ELSE 

DO * "Normal subroutine 1" 
ENDIF 

? 'llnJcnowni 1 

DO "Coopression location ,prg" 
ELSE 

DO "Normal subroutine 1" 
ENDIF 

IF C0NDEW=1 

SSI" DEVICE. TO PRINTER 

SET PRINTER CW 

EJECT 

DO "Output heading.prg' 
USE "Analysis location. dbf 
DO "Create bargraph.prg 1 
SET -HEADING OFF 

J * FUNCTIONAL CLASS TOTAL UNIQUE NSW % TOTAL' 

ERASE TEKP2.DBF 
SETT HEADING ON 

*USE & '2martGiyiFoxaAS3+/Mac:fox files jTEMPMASTER. dbf" 
CASE ANAL=6 

* arrange/distribution 

SET HEADING ON 

STORE 3 TO AMPLIFIER 

? 'Cell/tissue specific distribution:' 

^SSST^'^^ FIELDS ^^^ t ^ WB ^ f ^*^ i ^'^*^ , ^ , ^ TR ^' 5 ' DESCRIPTOR, LHTC3TH, INIT ( I,CQMMSJ 

DO "Compression diserib.pry 
ELSE 

DO "Normal subroutine 1" 
ENDIF 

? 'Non-specific distribution i 1 

SORT^^W, NUMBER FIELDS RFEND, NUMBER, L, D, F, Z,R,C, ENTRY, S, DESCRIPTOR, LENGTH, INIT, I,CQMMEN- 
DO "Compression distrib.prg" 



DO "Normal subroutine 1" 
ENDIF 

? 'Unknown distribution: ■ 

IF R CoSEff V ' NC, ^^ 1ELDS R ™' K0TO '*' D ' P ' Z '^ 

DO "Cenpression distrib.prg" 
ELSE 

DO "Nonr-el subroutine 1" 
EKDIF 

IF CQNDENsl 

SET DEVICE TO PRINTER 

SET PRINTER ON 
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EJECT 

DO "Output headi.nsr.prg" 

USE 'Analysis distribution.dbf • 

DO 'Create bargraph.prg' 

SST HEAPING OFF 

I '. TONCTIONAL CLASS TOTAL UNIQUE % TOTAL' 

CLOSE°DATABA^S P,N ^' CI ^ S ' Q ^ S ' PE ^ 
ERASE TEMP2 .DBF 
SST HEADING CN 

^E^'SmartGuy:PoxBASE+/Mac:£ox files :TEMPMASTER, dbf 0 

CASS ANAL=7 

* arrange/function 

SST HEADING ON 

STORE 10 TO AMPLIFIER 

y ' BINDING PROTEINS' 

? 'Surface molecules and receptors: ' 

DO 'Compression function.prg* 

DO ■Normal subroutine 1" 
EMDIF 

? 'Calcium-binding proteins: ' 

S R S£Sr Y,iraMBSR F1ELDS ^' i ^' L ' D ' F '2'R-C^V, S/ DE SCS l^ R , I ^, INIT< i, C0IdHEN 

DO 'Compression f unction .pro ■ 
ELSE 

DO "Normal subrdutine 1" 
ENDS? 

? 'Ligands and effectors i • 

OaSff"'*""* PIELDS ^ l ^' L ' D ' ? '2'R'?.^V, S , raS CRl W 0 Rf t a ^ f i N1 T,I,C^ 

DO 'Conpression function. prg" 

ELS3 

DO "Normal subroutine 1* 
ENDIF 

? 'Other binding proteins: ' 
DO 'Compression function.prg" 

TTT .OTP * 



DO "Normal subroutine 1* 

ENDIF 

•EJECT 



? ONCOGENES' 



? 'General oncogenes: 1 

DO "Compression function.prg* 
ELSE 



DO •Normal subroutine 1 
ENDIF 



^*GTP-Wnding proteins i 1 
D^Orrpression function.prg- 



DO 'Normal subroutine 1" 
ENDIF 

? 'Viral elements i • 
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DO-CorpresBion runction.prg' 



00 ■Normal subroutine 1" 
BJDIF 

LJ^Sf. s i 8 W(2 Phosphatases! 1 

sue--— «— ^«».M**< fc -«.*»»i n ».«-«.»». fc<ai « 

00 ^Compression function. prg« 

ELSE 

DO "Normal subroutine 1" 
ENDXF 

'^w^-related antigens i • 

^Compression function, prg" 

DO "Normal subroutine V 

33DIF 

♦EJECT 

7 ' PROTEIN SttCIHETIC MACHINERY PROT2IN3' 

DC^Coatpression runction.prg' 
DO 'Normal subroutine l 1 

ENDIF 

? '^Translation: 1 

Decompression function. prg- 

po "Normal subroutine 1- 
SNDIF 

? 'Ribosooal proteins: ■ 

^Compression function.prg" 

DO "Normal subroutine 1" 
ENDIF 

L'* r ° teia processing! • 

DO -Coirpression function. prg-. 

ELSE 

DO 'Normal subroutine 1" 
EMDIF 

*BJECT 

I ' ENZYMES* 

I '^erxoproteinsi 1 

Decompression f unction. prg" 

DO formal subroutine 1' 
ENDIF 

o^m r0teaaes Md i^ibitors:' 
PO^Campreasi n function .prg" 
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DO 'Normal subroutine 1" 
SJDXF 

7/ Oxidative phosphorylation:* 

DO "Conpreaaion function. pro- 
ELSE 

DO "Normal subroutine l- 
QDIF 

? 'Sugar -metabolismi ' 

^Q^- HY,N ™ FIH,DS »W,WJ,UC,ilW I | l lTOW ( l Ml iBi A fl^ 

DO "Compression function. prg 1 
ELS2 

DO "Normal subroutine 1 B 
©OIF 

? 'Amino acid metabolism: 1 

S"£SS R1f ' NUISHR F1ELDS ^' ITO ' L ' D ' F 'Z'*^^S,I^ 

DO 'Conpression function. prg' 
ELSE 

DO 'Normal subroutine 1* 
ENDIF 

? 'Nucleic acid metabolism! • 

DO "ConpreBsion function. pro* 
ELSE 

DO a Normal subroutine 1' 
ENDIF 

7 'Lipid metabolism: 1 

DO 'Compression function. prg" 

EL^E 

DO -Normal subroutine 1" 
ENDIF 

? 'Other enzymes i 1 

DO 'Conpression function. prg" 
ELSE 

DO 'Normal subroutine 1° 

ENDIF 

♦EJECT 

* ' MISCEIiLANHXIS CATEGORIES ' 

7 1 Stress' response r ' 

?? R S 2 SS RY ' miBSR FIELDS ™'*«»^».'.*'»-C l W ( 8,IWS^ 

DO 'Compression functioh.prg" 
ELSE 

DO •Normal subroutine 1° 
ENDIT 

7 'Structural; 1 

po •Compression function. prg" 
ELSE 

DO 'Normal subroutine 1° 

a©iF 

7 'Other clones! » 

WRT^W^MTRy, NUMBER FIELDS RPEND, NUMBER, L, D, P, Z , R, C , EWTRY, S ,' DESCRIPTOR , LENGTH, INIT, I , COMMEN 

DO •Compression function .prg" 
ELSE 
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DO "Normal subroutine 1" 
ENDIF 

? 'Clones of unknown function:' 

DO "Compreflflion function .pro* 
ELSE 

DO "Normal subrouting 1" 
ENDIF 

IF CONDENel 
EJECT 

♦SET DEVICE TO PRB7TER 
*SET PRINT ON 
DO^'Output heading. prg* 

USE * Analysis function, dbf 
DO "Create bargraph.prg" 
SET HEADING OFF 
*★* 

SCREW 1 TSPS 0 HEADING "Screen 1- AT 40.2 SIZE 2*6,492 PIXELS PORT W,U COLOR 0,0,0 

? » 

? 1 FUNCTIONAL CLASS ~ TOTAL TOTAL NEW DIST 

•? ■ u«.ijunau CLASS CLONES GEN2S GENES FUNCTIONAL CLASS' 

**• 

^mT^^^lSS^!?^^^^' PERC5NT,GRAPH # C04PANy 
CLOSE StA^S ***^' CU ^' G ^'^*^^ 
ERASE TEMP2.DBF 
SET HEADIN3 ON 

^SE^ - SrrartGuy:FQxBASS+/Maci fox files : TEMPMASTER . dbf " 
CASE ANAL=8 

DO 0 Subgroup sunraary 3.prg" 
QJDCASE 

DO "Test print, prg" 
SET PRINT OFF 
SET DEVICE TO SCREEN 
CLOSE DATABASES 

* ERASE TEMPLIB.DBF 
•ERASE TEMPNUM.DBF 

* ERASE TEMFDESIG.DBF 
♦ERASE SELECTED. DBF 
CLEAR 

LOOP 
ENDDO 
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* COMPRESSION SUBROUTINE FOR ANALYSIS PROGRAMS 

USE TEMPI 

COUNT TO TOT 

REPLACE ALL RFEND WITH 1 

MARK1 = 1 

SW2&0 

DO WHILE SW2=0 ROLL 
IF MARK1 >» TOT 
PACK 

COUNT TO UNIQUE 

COUNT TO NEWGENES FOR D= 'H ' .OR.D= , 0' 

SW2&1 

LOOP 

EMDIF 
GO MARX1 
CUP s 1 

STORE ENTRY TO TESTA 

SW e 0 

DO WHILE SW=0 TEST 
SCOT 

STORE EWTRV TO TESTS 

IP TESTA = 

DELETE 

UUP = DUPrl 

LOOP 

H5DIF 
GO MASK1. 

REPLACE RFEND WITH DUP 
MARKl « MARK1+DUP 
SW=1 
LOOP 

ENDDO TEST 
LOOP 

EKDDO ROLL 
•GO TOP 

STORE Z TO LOC ' 

USE 'Analysis location, dbf 

LOCATE FOR Z*LOC 

REPLACE CLONES WITH TOT 

REPLACE GZNES WITH UNIQUE 

REPLACE NEW WITH NEWGENES 

USE TEMPI 

SORT ON RFEND/ D TO TEMP2 

USE TEMP2 

77 STR (UNIQUE, 5,0] 

77 1 genes, for a total of 1 

?? STOfTOT^O) 

?? 1 .clones 1 

* ' V Coincidence' 

list off fields number, RFEND, L, D,F,Z, R, C, 2STRY, S, DESCRIPTOR, LENGTH, INIT, I 

•SET PRINT OFF 
CLOSE BATA3ASES 
ERASE T24P1.DBF 
ERASE TEMP2.DBF 
USE TEMPDESIG 
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* COMPRESSION SUBROUTINE ?0R ANALYSIS PROGRAMS 
USE T£MPl 
COUNT TO TOT 

REPLACE ALL RFEND WITH 1 

MARX1 o 1 

SW2-0 

DO WHILE SW2=0 ROLL 
IP MARK1 >= TOT 
PACK 

COUNT TO UNIQUE 

SW2=1 

LOOP 

ENDtP 
GO MARKl 
CUP = 1 

STORE ENTRY TO TESTA 
SW * 0 

DO WHILS SW=0 TEST 
SKIP 

STORE ENTRY TO TESTB 

IF TESTA s TESTB 

DELETE 

DOT « DUP+1 

LOOP 
■ ENDIF 
GO MARKl 

REPLACE RFEND WITH DUP 
MARK1 « MARK1+DU? 
SW=1 
LOOP 

ENDDO TEST 
LOOP 

ENDDO ROLL 
•BROWSE 

**SET PRINTER ON 

SORT ON DATE TO TEMP2 

USE TQIP2 

?? STR (UNIQUE, 4, 0) 

?? 1 genes, for a total of 1 

?? STR(TOT,4,0) 

77 clones 1 

? 

v Coincidence 1 

COUNT TO P4 FOR In 4 

IF P4>0 

7 STR(P4,3,0) 

?? 1 genes with priority a 4 (Secondary analysis:) 1 

list off fields number / RFEND, L, D, F, Z , R, C, ENTRY/ s) DESCRIPTOR, LENGTH, INIT for 3»4 
ENDIF 

COUNT TO P3 FOR 1*3 

IF P3>0 

? STR(P3,3,0) 

?? ' genes with priority = 3 (Full insert sequence;) • 

list off fields number. RFEND, L, D, F, Z # P./ C, ENTRY, S, DESCRIPTOR/ L2NSTK, INIT for 1*3 
ENDIF 

COUNT TO P2 FOR 1=2. 

IF P2>0 

? STR(P2,3,0) 

1LI fl 52°2i W & h P ri ? rit y ° 2 (Primary analysis eortplete:)' 

list Off fields number #RFEfro,L,D,F,Z,R,C,ENn*Y, 6, DESCRIPTOR, LENGTH, INIT for 1=2 
ENDIP 

COUNT TO PI FOR 1=1 
IF P1>0 
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? STR(P1 # 3,0) 

ILL S« e f. W ^ h P riDrifc y * 1 (Primary analysis needed.)' 

Jg^.. fields number , RFEND , L , D, P, Z , R , C/ WTRY, 3 « DESCRIPTOR , LENGTH, EtXT for 1*1 



*SET PRINT OPP 
CLOSE DATABASES 
ERASE TEMPI, DBF 
ERASE TEMP2.DBF 

USE 'SmartGiQriFoxBASE+/Macifox files: clones, dbf 
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VSZ7^f SIW SU3R WnNB FOR ANALYSIS PROGRAMS 
COUNT TO TOT 

REPLACE ALL RFEND WITH 1 

MARK1 = 1 

SW2*0 

DO WHILE SW2&0 ROLL 
IP MARK1 >= TOT 
PACK 

COUNT TO UNI WE 

SW2=1 

LOOP 

ENDIF 
GO MARK1 
TOP a 1 

STORE ENTRY TO TESTA 
fiW s 0 

DO WHILE SW=0 TEST 
SKIP 

STORE ENTRY TO TESTS 

IP TESTA = TESTE 

DELETE 

DDP = DOT+1 

LOOP 

HOIF 
00 MARKl 

REPLACE RFEND WITH DUP 
MARK1 = MARKl+EOT 
SWbl 
LOOP 

ENDDO TEST 
LOOP 

Q©DO ROLL 
* BROWSE 

♦SET PRINTER ON 

SORT ON NUMBER TO TEMP2 

USE TD4P2 



STR (UNIQUE, 4,0) 
?? ' genes, for a total of « 
?? STO(TOT ( 5,0) 
?? ' clones • 

iM«- ^ ^ , v Coincidence 1 

liflt Off fieldB nuniber , RFSND , L , D , F, Z, R, C , EWTRY, S , DESCRIPTOR, LENGTH , INIT , I 

*SET PRINT OFF 
CLOSE DATABASES 
ERASE TEMPI .DBF 
ERASE TEWP2 .DBF 

USE 'SmartGMyjFoxBASEt/Maerfox files: clones. dbf ■ 
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* COMPRESSION SUBROUTINE FOR ANALYSIS PROGRAMS 

USE TEMPI 

COUNT TO TOT 

REPLACE ALL RFEND WITH 1 

MARK1 = 1 

SW2=0 

DO WHILE SW2«0 ROLL 
IF MARK1 >» TOT 
PACK 

OCUOT TO UNIQUE 

COUNT TO NEW3ENES FOR D='H' .OR.D^'O 1 

SW2«1 

LOOP 

ENDIF 
GO MARK1 
DUP - 1 

STORE ENTRY TO TESTA 
SW o 0 

DO W HILE SW=0 TEST 
SKIP 

STORE_ENTRY TO TEST3 

IF TESTA = TESTB 

DELETE 

DUP s DUP+1 

LOOP 

?NDIP 
GO MARK1* 

REPLACE RFEND WITH DUP 
MARK1 « KARK1+DUP 
SW*1 
LOOP 

ENDDO TEST 
LOOP 

ENDDO ROLL 
GO TOP 

STORE R TO FUNC 
USE "Analysis function, dbf 
LOCATE FOR P=FUNC 
'REPLACE CLONES WITH TOT 
REPLACE GENES WITH UNIQUE 
REPLACE NEW WITH NEWGENE5- 
USE TEMPI 

SORT CN RFEND/D TO TEMP2 

USE TEMP2 

SET HEADING ON 

11 STR (UNIQUE, 5,0) 

11 ' genes, for a total of ' 

11 STR{TOT f 5,0) 

11 • clones 1 
*** 

? ' . V Coincidence' 

list off fields number, RFEND, L, D,F, Z, R / C ( ENTRY,S / DESCRIPTOR, LENGTH, INIT, I 

*STal fSL° AT 40,2 SI2B 266,492 FIXEM *«" o.o. 

*SET PRINT OFF 
CLOSE DATABASES 
ERASE TEMPI. DBF 
ERASE TOO>2.DBF 
USE TEMPDESIG 
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* COMPRESSION SUBROUTINE FOR ANALYSIS PROGRAMS 
USB T5MP1 
COUNT TO TOT 

REPLACE ALL RFE2JD WITH 1 

MARK1 n 1 

SW2t:0 

CO WHILE SW2=0 ROLL 
IF MARKL >a TOT 
PACK 

COUNT TO UNIQUE 

SW2=1 

LOOP 

ENDIF 
CO MARK1 
DUF a X 

STORE ENTRY TO TESTA 

SW a 0 

CO WHILE SHsO TEST 
SKIP 

STORE ENTRY TO TESTS 
IF TESTA « TESTB 

CUP = DOP+1 
LOOP 

GO MARK1 

REPLACE RFEND WITH EOT 
MARK1 = MARK1+DUP 
SW=1 
LOOP 

ENDDO TEST 
LOOP 

ENODO ROLL 
GO TOP 

STORE F TO DIST 

USE "Analysis distribution, dbf 
LOCATE FOR P=DIST 
REPLACE CLONES WITH TOT 
REPLACE GENES WITH UNIQUE 
USE TEMPI 

sort on rfftnd/d to TEMP2 

USE TEMP2 

?? SIR (UNIQUE, 5,0) 

7? • genes, for a total of 1 

?? STR(TOT # 5,0) 

7? 1 clones' 

|. 1 V Coincidence ' 

list off fields number, RFEND, L,D,F r Z, R/C,ESTniY,S, DESCRIPTOR, LENGTO, INIT, I 

*SET PRINT GPP 
CLOSE DATABASES 
ERASE TEMPI. DBF 
.ERASE 1EMP2.DBF 
USB TEMEDESIQ 
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USB^Sl SI0N SOTR0OTINE roR ANM'VSIS PROGRAMS 

COUNT TO TOT 

REPLACE ALL RFEND WITH 1 

MARK1 e 1 

SW2-0 

DO WHILE SW2=0 ROLL 
IF WftRKl >- TOT 
PACK 

COUNT TO UNIQUE 

SW2al 

LOOP 

£20)1? 
GO MARX1 
DUP » 1 

STORE ENTRY TO TESTA 
SW 0 0 

DO WHILE SW=0 TEST 
SKIP 

STORE ENTRY TO TESTS 

IF TES TA = TESTS 

DELETO 

DDP.a 

LOOP 

ENDIF 
GO MARK1 

REPLACE -RFEND WITH CUP 
MARK1 s MAHK1+DUP 
SW=1 
LOOP 

SNDDO TEST 
LOOP 

ENDDO ROLL ' 

GO TO? 

USE TEMPI 

?? STR (UNIQUE* 5,0) 

?? 1 genes, for a total of • 

?? STR(T0T,5,0) 

?? ' clones ' 

i . V Coincidence ' 

list off fields ™<b»,RFOT^ 

•SET PRINT OPP 
CLOSE DATABASES 
ERASE TEMPI. DBF 
USE TEMEDESIG 
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L^Sl™ SUBROUTINE FOR ANALYSIS PROGRAMS 
USB TEMPI 

^^^^^ 

PACK ° ' 0R - D * A • 0R - Xte U '•W-^'S'.OR.D.-M'.OR.fc-R'.CR.fc'V 



CCUWT TO TOT 
REPLACE ALL RFEND WIT- 7 1 
MARKl e 1 * 
SW2»0 

DO WHILE SW2=0 ROLL 
IP MARK! >= TOT 
PACK 

COUNT TO UNIQUE 

SW2=1 . 

LOOP 

ENDIF 
GO MARXl 
DUP b 1 

STORE ENTRY TO TESTA 
SW b 0 

DO WHILE SW=0 TEST 
SKIP 

STORE ENTRY TO TESTB 
IF TESTA a TESTB 
DELETE* 
CUP o DOP+1 
LOOP 
2MDIF 
GO MARXl 

REPLACE RFEKD WITH DCJP 
MARXl 3 MARX1+DUP 
Sifel 
LOOP 

ENBDO TEST 
LOOP 

ENDDO ROLL 
*BROWSE 

*SET PRINTER QN 

R^TO/D^NUKBER TO TEXP2 
USE TEMP2 

" iSSRsfr 8 tetal of * 

?? • clones* 



« ass v v ««"»'«mo' 

5KA2£WS^^ 0.0,0. 



CLOSE DATABASES 
ERASE TEMPI. DBF 
ERASE TEMP2.BEBF 

USE *amartGuy:FoxBASEt/Mac:fox f lies: clones. dbf- 
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^Q^^SSION SUBROUTINE FOR ANALYSIS PROGRAMS 

COOT TO IDC3ENE FOR D-'S' .0R.D='0\0R.D= 'H' .GR.fe'N' .OR D='R' OR D='A. 
Jg» FOR **.CBLD-.^0^^ 

COUNT TO TOT 

REPLACE ALL RFEMD WITH 1 

MARK1 = I . 

SW2=0 

DO WHILE SW2=0 ROLL 
IP MARK1 >= TOT 
PACK 

COUNT TO UNIQUE 

SW2=1 

LOOP 

ENDIF 
00 MARK1 
COP a 1 

STORE ENTRY TO TESTA 
SW « 0 

DO WHILE SW=0 TEST 
SKIP 

STORE ENTRY TO TESTB 

IP TESTA = TESTB 

DELETE 

OTP * DUP+1 

LOOP - 

ENDIF 
GO MARX1 

REPLACE RFEND WITH WP 
a MARX1+DUP 

SW=1 
LOOP 

ENDDO TEST 
LOOP 

ENDDO ROLL 
♦BROWSE 

*SET PRIOTSR ON 

SORT ON RFEND/D, NUMBER TO TEMP2 
USE TDC2 

REPLACE ALL START WITH RFEND/IDGENE* 10000 

?? 5TR(UNIQUS,5,0) 

?? ' genes, for a. tdtal of 1 

?? STR(TDT,5,0) 

?? 1 clones' 

? w C°*? cidence V V Clones/10000 1 

set heading off 

CLOSE DATA3ASES 
ERASE TEMPI. DBF 
ERASE TEMP2.DBP 

USB ■SmartGuy;FoxBASE+/Macifox files: clones. dbf' 
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USE TEMPI 
OCflNT TO TOT 
?? 1 Total of 
?? STR{TOr,4,0) 
?? 1 clones' 
? 

lS^SF-f 1 ?^ nu f^ er '^' 2 / K,C f ENTRY,DESCRIPTOH f LENGTH, RFZ2ID, INTT I 

ERASE • . DBF 

USE TEMPDESXG 
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•lifescan menu; version 8-7-94 

SET TALK OFF 

set device to screen 

CLEAR 

USE a SmartGuy:PoxBASE+/Macifcoc files 2 clone s.dbf • 
STORE LUFDATEO TO Update 
GO BOTTCM 

STORE REGNO () TO cloneno 
STORE 6 TO Chooser 
00 WHILE .T. 

* Program. 1 Lifeseq roenu.fmt 

* Date....i 1/11/95 

* Version.: FoxBASE+/Mac, revision 1.10 

* Notes. . . . ; Format file Lifeseq menu 

a *™ "1.44 SMf -Total clones:- STYLE 65536 FONT 'Ce^Itf OMR 6 6 -f-i -1, 
8 PIXELS 4 3 ,296 SAY -vl.30- STYLE 65536 FOMT -Geneva- ,782 COLOR oTo, ^, -i;-l^l 

• EOF: Lifesecx menu.fim: 

READ 

DO CASE 

CASE Choosers! 

SsBSS 2 FCX2ASE+/MaC,fOX files :Oatput programs blaster analysis 3.prg- 

' ^E^os^if CiX3ASE * /MaC J * °* fileS:0ut P ut Programs iSubtracticn 2.prg- 

^^ooSi4° XB; ^ +/MaC:fOX files:0ut P ut Program northern (single) .prg" 

USS "Libraries. dbf" 

BROWSE 

CASE Chooser»5 

^SSn^*^ : f ° X file950ut P ut ProgramsiSee individual clone,prg» 

^SSs^r? 0 ^^^ 0 '^ £ilesjUbraries *OutPUt programs :Menu.prg- 
CLEAR 

SCREEN 1 OFF 

RETURN 

ENDCASB 

LOOP 
EMDDO 
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01,30 SAY "Database Subset Analysis" STYLE 65536 FOOT "Geneva ",274 COLOR 0,0,0,-1,-1,-1 



? 
? 

? date<) 
?? » 

77 TIMBO 

? 'Clone nunibers ' 

?? STR( INITIATE, 6,0) 

?? ■ through ' 

?? STR (TERMINATE, 6,0) 

7 'Libraries i ' 

IP ENTIRE=1 

? 'All libraries' 

EHDIF 

IF ENTIRE=2 
MARKal 
DO WHILE ,T. 
IF MARK>STO?IT 
EXIT 
ENDIF 

USE SELECTED 

GO MARK 
? i . 

77 TOIMdibname) 
STORE MARK+1 TO MARK 
LOOP 
ENDDO 
SNDIF 

? , Designationsi • 

IF £raateh=0 .AND. Hmatch=0 .AND. Onatch=0 

?? 'All' 

ENDIF 

IF aaatch*! 
?? 'Exact, 1 
ENDIF 

IF Hmatch=l 
?? 'Human, • 
ENDIF 

IF Qmatch=l 
7? 'Other sp. 1 
ENDIF 

IF CONDENnl 

? 'Condensed format analysis' 

ENDIF 

IF ANALol 

?' 'Sorted by NUMBER' 

ENDIF 

IF ANAL=2 

? 'Sorted by ENTRY' 

ENDIF 

IF ANAL«3 

? 'Arranged by ABUNDANCE' 

ENDIF 

IF ANAL=4 

? 'Sorted toy INTEREST' 

ENDIF 

IF ANAL=5 

? 'Arranged ty LOCATION' 

ENDIF 

IF ANAL-6 

? 'Arranged by DISTRIBUTION • 

ENDIF 

IF ANAL»7 

? 'Arranged by FUNCTION' 
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2NDIF 

? 'Total clones represented; 1 

?? OTR(STARTOT,6,0) 

? 'Total clones analyzed! 1 

?? STO ( AMALTOT, 6, 0) 

? 

? 
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USB TEMPI 
O00NT TO TOT 
?? ' total of 
?? STR(T0T,4,0> 
?? • clones 1 
? 

1 H^Ji f -5 if i* 1 L ' D ' F - z ' *' C, ENTRY, DESCRIPTOR, LENGTH, RF2ND, INIT, I 

CXOSB DATABASES 
ERASE TEMPI. DBF 
USB TS4PDESIG 
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USB TEMPI 
COUNT TO TOT 
?? ■ Sbtal of 

?? sra{Tor,4,o) 

?? 1 clones! 

7 . 

l^h t r,^f i fi d3 m3 ^ 3es ' L > D, F, Z , R, C/ ENTRY, DESCRIPTOR, LENGTH, RFEND INIT I 

ERASE TSIPI.DB? 
USB TEMPDSSIG 
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♦Northern (single) , version 11-25-94 

close databases 

SET TALK OFT 

SET PRINT 0?? 

SET EXACT OFF 

CLEAR 

STORE ' 1 TO Eobject 

I _ 'TO Dobject 

STORE 0 TO Numb 
STOR E 0 TO Zog 
STORE 1 TO Bail 
DO WHILE .T. 

* Program.: Northern (single) .fmt 

* Date : 8/ 8/94 

* Version.: FoxBASS+/Kao, revision l.io 

* Notes : Format file Northern (single) 



I SSf ??i 7 L TO 192 ' 422 28447 COLOR 6,6,0 -25600 -li-I 

1 555? ^f'fS. 8 ** ,ftltr y #: ' SWLE «536 FOOT "Geneva " , 12 COLOR 0 0,0 -1 -1 -1 

2 525? ^\ 6 ?-. 9AY description- STYLE 65536 FOOT "Geneva", 12 COLOR 0 0 6 -1 -l -1 

I Sf iihl 7 ! Q2T 0 ^ "0«W«M2 SIZE 15,241 COLOR 6 6 o'-i'5 -1 

1 ^A 8 ?.?* " Singl * Korther * ^arch screen" STYlT6S536 Fa^^w^ 274 # M 0 0 - 

• ££S SSI?^.^^^^ ,chica50< ' 12 picture "3 *r 2S^Si?2' s?^ 

2 ffiSuwW?^ .J?P* * 5536 m 'G«W;12 COLOR 0,0,0,-1,-1-1 

J pSI 1 \w -S? ®™ ° ^ ' Geneva "' 12 SIZE 15 '™ COLOR 0,0^0,-1,-1,-1 ' 
0 PIXELS 80,152 SAY -Enter any ONE of the following:- STVLE 65536 FONT Geneva- 12 COLOR -1, 



B0SF: Northern (single) . fiat 
READ 

IF Bail«2 
CLEAR 

screen 1 off 

RETURN 

ENDIF 

USE 'SmartGuy : FoxBASE*/Mac : Fox files : Lookup. dbf" 
SET TALK 'ON 



IF Eobjecto' • 

STORE UPPER (Eobject) to Eobject 

SET SAFETY OFF 

SORT ON Entry TO "Lookup entry, dbf* 

SET SAFETY ON 

USE "Lookup entry, dbf* 

IOCATE FOR Loo)c«Eobject 

IF .NOT. POUND () 

CLEAR 

LOOP 

ENDIF 

BROWSE 

STORE Entry TO Searchval 

CLOSB DATABASES 

ERASE 0 Lookup • entry . dbf ■ 

ENDIF 



IF Dobjecto' i 
SET EXACT OFF 
SET SAFETY OFF 

SORT ON descriptor TO "Lookup descriptor, dbf * 
SET SAFETY On 

USE "Lookup descriptor * dbf • 

LOCATE FOR upper (TRIM (descriptor) ) =UPPER (TRIM (Dob-j ct) ) 

IP .NOT.FCUNDO 

CLEAR 
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LOOP 

ENDIF 

BROWSE 

STORE Entry TO Searchval 

CLOSE DATABASES 

ERASE "Lookup descriptor. db£ u 

SET EXACT ON 

ENDIF 

IF NusriboO 

USE B Smart Guy : FoxBASE+/Mac t Fox files: clones. db£ f 

GO Kuxub 

BROWSE 

STORE Entry TO Searchval 
SNDIF 

CLEAR 

? 'Northern analysis for entry * 

?? Searchval 

? 

? 'Enter Y to proceed 1 

WAIT TO OX 

CLEAR 

IF UPFSRtOK)<> , Y' 
screen 1 off 
RETURN 
ENDIF 

* COMPRESSION SUBROUTINE FOR Library, dbf 
? 'Compressing the Libraries file now.,.' 

USE 'SxnartGuy:FoxBAS£+/MaciFox files:libraries.dbf • 
SET SAFETY OFF 

SORT ON library TO "Compressed libraries, dbf w 

* TOR entercd>0 
SET SAFETY ON 

USE 'Compressed libraries. dbf ■ 

DELETE FOR entered- 0 

PACK 

COUNT TO TOT* 
MARK1 a 1 
SW2nO 

DO WHILE SW2=0 ROLL 

IF MARK1 >* TOT 

PACK 

SW2=1 

LOOP 

ENDI? 
GO MARK1 

STORE library TO TESTA 
SKIP 

STORE Library TO TESTB 
IF TESTA = TESTE 
DELETE 
SNDIF 

MARK1 . MARK1+1 
LOOP 

ENDDO ROLL 

* Northern analysis 
CLEAR 

? 'Doing the northern now. . . 1 
SET TALK ON 

USE "SmartGuy:FoxBASE*/Mac:Fox f iles: clones. dbf ■ 
SET SAFETY OFF 

COPY TO "Hits. dbf • FOR entry=searchval 
SET SAFETY ON 
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CLOSE DATABASES 
SELECT 1 

USE "Conpressed libraries. dbf* 

STORE RSCOOUWTO TO Entries 

SELECT 2 

USE 'Hita.dbf" 

Marfesl 

DO WHILE .T. 

IF Mark>Entries 
EXIT 

GO MARK 

STOKE library TO Jigger 
SELECT 2 

COUNT TO Zog FOR library=Jigger 
SELE CT X 

REPLACE hits with Zog 

Marte&ark+l 

LOOP 

EMDDO * 



SELECT 1 

BROWSE FIELDS LIBRARY, LIBNAME, ENTERED, KITS AT 0,0 
CLEAR 

? 'Enter Y to print: 1 

WAIT TO FHINSET 

IF UPPER (FRINS3T) s 'Y • 

SET PRINT ON 

CLEAR 

EJECT- 

f^LJT^fSS^^ ?* 4 °' 2 SI2S 2B6 ' 4 " "™ *** "Geneva", 14 COLOR 0,0,0 
Searchval 

au as.ags.sAsar- 28s '« 2 «— ■<*—■•' «« 

o 

SELECT 2 

oTlSx 3?" ^^^BRARY^D, S,?, Z, R, 2OTRY, DESCRIPTOR, RESTART, START, RFEND 

SET PRINT OFF 
ENDIF 

CLOSE DATABASES 
SET TALK OFF 
CLEAR 

DO 'T est print .prg" 
RETURN 
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TABLE 6 



library libnamo 
ADSN1NB01 Inflamed adenoid 
ADRENQR01 Adrenal gland (r) 
ADRENOT01 Adrenal giand (T) 
AMLBMOTD1 AML blast cells (T) 
6MAHNOTD1 Bone marrow 
BMARNOT02 Bone marrow (T) 
CARDNOTOt Cardiac musde (T) 
CHAONOTO1 Chin, hamster ovary 
CORNNOT01 Corneal stroma 
FBRAGTD1 Fibroblast, ATS 
FIBRAGTQ2 Fibroblast. AT 30 
RBRANT01 Fibroblast AT 
R2RNGT01 Fibroblast, uv 5 
RBRNQTC2 Fibroblast, uv 30 
FIBRNOT01 Rbroblast 
P2RNOTD2 Fibroblast, normal 
HMC1NOT01 Mast call line HMC-1 
HUVELPB01 HUVEC !FN,TNF,LPS 
HUVENOB01 HUVEC eonrrol 
HUVESTB01 HUVEC shear stress . 
HYPONOB01 Hypothalamus 
KIDNNOT01 Kidney fT) 
UVRNOrroi Liver fT) 
LUNGNOT01 LungfT) 
MUSCMOT01 Skaleial mu&de (T) 
OVtONOBOl Oviduct 
PANCNOTOi Pancreas, normal 
FrrUNOROl Pituitary (r) 
Prn/NOTDI Pituitary fT) 
PUCNOB01 Placenta 
SJNTNOT02 Small intestine (T) 
SPLNFETD1 Spleenrlh/er, fetal 
SPLNNOTD2 Spleen fT) 
STOMNOT01 Stomach 
6YNORA601 Rheum, synovium 
TBLYNOTD1 T + B rymphoblast 
7ESTNOT01 Testis fT) 
THP1NOB01 THP-1 control 
THP1PEB01 THP phorbol 
THP1 PLB01 THP-1 phorbol LPS 
U937NOT01 U937, monocytic leuk 



numberlibrary d s f z t entry descriptor rfetartetaM rfend 

2304 U937NOT01 E H C C T HUMEF4B Elongation lador 1-bela tj. 0 77/ 

3240 HMC1NOT01 6 H C C T HUMEF1B Elongation fador 1 -beta 0 „i 

32B9 HMC1NOT01 E H C C T HUMEF1B Etongaiion (actor 1-beta 0 t?< Lit 

4693 HMC1NOT01 E H C C T HUMEFlB Elongation factor 1-beta 0 470 773 

8389 HMC1NOT01 E H C C T HUMEF1B Elongation facto n -bete 0 327 77a 

9139 HMC1NOT01 E H C C T HUMEF1B Elongation factor 1-beta 0 375 773 
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WHAT IS CLAIMED Ts » 

1. A method of analyzing a specimen. containing gene 
transcripts, said method comprising the steps of: 

(a) producing a library of biological sequences; 
5 (b) generating a set of transcript sequences, where 

each of the transcript sequences in said set is indicative 
of a different one of the biological sequences of the 
library; 

(c) processing the transcript sequences in a 
programmed computer in which a database of reference 
transcript sequences indicative of reference biological 
sequences is stored, to generate an identified sequence 
value for each of the transcript sequences, where each said 
identified sequence value is indicative of a sequence 
annotation and a degree of match between one of the 
transcript sequences and at least one of the reference 
transcript sequences; and 

(d) processing each said identified sequence value to 
generate final data values indicative of a number of times 
each identified sequence value is present in the library. 

2. The method of claim l, wherein step (a) includes 
the steps of: 

obtaining a mixture of mRNA; 
making cDNA copies of the mRNA; 
isolating a representative population of clones 
transfected with the cDNA and producing therefrom the 
library of biological sequences. 

3. The method of claim 1, wherein the biological 
sequences are cDNA sequences. 

4. The method of claim l, wherein the biological 
sequences are RNA sequences. 

5. The method of claim 1, wherein the biological 
sequences are protein sequences. 
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6. The method of claim l, wherein a first value of 
saird degree' of match is indicative of an exact match, and a 
second value of said degree of match is indicative of a 
non-exact match. 



5 7. A method of comparing two specimens containing 

gene transcripts, said method comprising: 

(a) analyzing a first specimen according to the 
method of claim 1; 

(b) producing a second library of biological 
10 sequences ; 

(c) generating a second set of transcript sequences, 
where each of the transcript sequences in said second set' 
is indicative of a different one of the biological 
sequences of the second library; 

15 • (d) processing the second set of transcript sequences 

in said programmed computer to generate a second set of 
identified sequence values known as further identified 
sequence values, where each of the further identified 
sequence values is indicative of a sequence annotation and 
a degree of match between one of the biological sequences 
of the second library and at least one of the reference 
sequences ; 

(e) processing each said further identified sequence 
value to generate further final data values indicative of a 
number of times each further identified sequence value is 
present in the second library; and 

(f) processing the final data values from the first 
specimen and the further identified sequence values from 
the second specimen to generate ratios of transcript 

30 sequences, each of said ratio values indicative of 

differences in numbers of gene transcripts between the two 
specimens . 
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8. A method of quantifying relative abundance of mRNA 
in a biological specimen, said method comprising the steps 
35 of: 

(a) isolating a population of mRNA transcripts from 
the biological specimen; 
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(b) identifying genes from which the mRNA was 
~~ " traHscriBea* by a sequence-specific method; 

(c) determining numbers of mRNA transcripts 
corresponding to each of the genes; and 

5 (d) using the mRNA transcript numbers to determine 

the relative abundance of mRNA transcripts within the 
population of mRNA transcripts, 

9. A diagnostic method which comprises producing a 
gene transcript image, said method comprising the steps of: 
10 < a > isolating a population of mRNA transcripts from a 

biological specimen; 

(b) identifying genes from which the mRNA was 
transcribed by a sequence-specific method; 

(c) determining numbers of mRNA transcripts 
15 corresponding to each of the genes; and 

(d) using the mRNA transcript numbers to determine 
the relative abundance of mRNA transcripts within the 
population of mRNA transcripts, where data determining the 
relative abundance values of mRNA transcripts is the gene 

20 transcript image of the biological specimen. 

10. The method of claim 9, further comprising: 

(e) providing a set of standard normal and diseased 
gene transcript images; and 

(f) comparing the gene transcript image of the 

25 biological specimen with the gene transcript images of step 
(e) to identify at least one of the standard gene 
transcript images which most closely approximate the gene 
transcript image of the biological specimen. 

11. The method of claim 9, wherein the biological 
30 specimen is biopsy tissue, sputum, blood or urine. 

12. A method of producing a gene transcript image, 
said method comprising the steps of 

(a) obtaining a mixture of mRNA; 

(b) making cDNA copies of the mRNA; 
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(c) inserting the cDNA into a suitable vector and 
using said* vector to transfect suitable host strain cells 
which are plated out and permitted to grow into clones, 
each clone representing a unique mRNA; 
5 (d) isolating a representative population of 

recombinant clones; 

(e) identifying amplified cDNAs from each clone in 
the population by a sequence-specific method which 
identifies gene from which the unique mRNA was transcribed; 
10 < f ) determining a number of times each gene is 

represented within the population of clones as an 
indication of relative abundance; and 

(g) listing the genes and their relative abundance in 
order of abundance, thereby producing the gene transcript 
15 image. 
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13. The method of claim 12, also including the step 
of diagnosing disease by: 

repeating steps (a) through (g) on biological 
specimens from random sample of normal and diseased humans, 
encompassing a variety of diseases, to produce reference 
sets of normal and diseased gene transcript images; 

obtaining a test specimen from a human, and producing 
a test gene transcript image by performing steps (a) 
through (g) on said test specimen; 

comparing the test gene transcript image with the 
reference sets of gene transcript images; and 

identifying at least one of the reference gene 
transcript images which most closely approximates the test 
gene transcript image. 

30 14. a computer system for analyzing a library of 

biological sequences, said system including: 

means for receiving a set of transcript sequences, 
where each of the transcript sequences is indicative of a 
different one of the biological sequences of the library 

35 and " 

means for processing the transcript sequences in the 
computer system in which a database of reference transcript 
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sequences indicative of reference biological sequences is 
stored, wherein the computer is programmed^ with software 
for generating an identified sequence value for each of the. 
transcript sequences, where each said identified sequence 
5 value is indicative of a sequence annotation and a degree 
of match between a different one of the biological 
sequences of the library and at least one of the reference 
transcript sequences, and for processing each said 
identified sequence value to generate final data values 
10 indicative of a number of times each identified sequence 
value is present in the library. 

15. The system of claim 14, also including: 

library generation means for producing the library of 

biological sequences and generating said set of transcript 
15 sequences from said library. 

16. The system of claim 15, wherein the library 
generation means includes: 

means for obtaining a mixture of mRNA; 

means for making cDNA copies of the mRNA; 

20 means for inserting the cDNA copies into cells and 

permitting the cells to grow into clones; 

means for isolating a representative population of the 

clones and producing therefrom the library of biological 
sequences. 



91 



WO 95/20681 



PCT/US95/01160 



SYBASE database Structure 

Library Preparation — 
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A W303 1A derfvacrw, SY2625 (M4Ta tra3-r fcuj^ 
/ 72 tpu J eese2- 1 can 1-iOOsst J A frt&tefUSU tac2 
hts3Ar^US;^«S3). was to perent strao tor the mutant 
amJi SY262S oarivatMS tor tha mating assays. 
owed pheromona assays, and ffw putee-chasa exj»- 
n»sms hduoed to totowwig sfreins: Y49 ist*22-i). 
Y115 Y142 eWJ.vtRA3X Y173 

(arfJAiiH/Z), Y220 l&lL-UW sta23arCR4j). Y221 
(sta23a.vlW43), Y231 (stfla.l£W f»r?rt-t£Ua. 
and Y233 (sra23AXacj. MATq dertvstrwjs of 
SV2625 hctoded the tolowfrig strains: ngg 
(SY2625 made M47oJ, Y278 (»ta22-;). V195 
(mfclAzXH/a, Y196 (arflA:.l£L(3. and Y197 
(ax/J.vURAJ). The EG123 (U47a Ieu2 ua3 trpt cant 
his4) genetic background was used to create a sat of 
strains tor analysis of bud site selection. EG123 de- 
rt*atfwes toctoded Vie folowtng strains: Y175 
(wrflo-Uae), Y223 (arf;rUft4J). Y234 tstettte 
LfUZL and V272 WteLBJS ste23&r:LEU%. 
MATa camatlvas of EG 123 frctoded the fofowni 
«ram: Y214 (EG 123 made M47o) and Y293 
Ifixfl&rLEUQ AS strains were generated by means 
of standard genetfe or mdecutar methods InvoTvino 
toaccroprtat»COTtnjcts{23).|nparticui^ 
5fe23 double muttnt stnani were owrted by ooss- 
Ing of the appropnata A447a ste23 and MAT* mx/1 

^artiBcmonct^OoJUemJtaraimnorpe- 
«ntt dWype tetrads. Gene dsrupttons were con- 
firmed with either PCR or Southern (DNA) anarysb. 
P129 b a YEp352 (J. E Ha, A W Myers, T. J. Ko- 
emer. A Tzagotoff. rtaasr2, 163 (1 9B6JI pfasmbecn- 
» * g ■ — - .... 



Mark Schena,* Dari Shalon,*t Ronald W Davis 
Patrick O. Brown* 

r&E2g^^ of ^ 9 enes h 

glass were used for quar^Ve^reS^f ^ 9 °' «"^>'ementary DMAs on 
Because of the smaf? t££Z£gl TSZZ^"^ 1 ™*^ ^"es. 
microlttefs couW be used that enaSpJ i . arrays ' hybridization volumes of 2 

derived from 2 micro^rof Slef^L^T 6 > I anSCn 'P ts ln ^ mbrtu^ 
measuremems of 45 ^toop^ o^s wte rr^T' ^ Dif,e ' errtial express^ 
fluorescence hybridization 8 by means of simultaneous, two-color 



The ttn,po ra l developmental, topographi- 
cal. hBtolopcal, and physiological paaems 
in which a gene is expreoed provide clues to 
bu>bg,cal role. The fcoge and expanding 
Qatabaic of complementary DNA (cDNA) 
•equEnces man many organisms (J) presents 

r£ U^TI^ "^f ini "R thae patterns at 
the level of the whole genome. 

For these studies, we used the small flow- 
enng plant Aratafapjij dufcma as a model 
organism. Aratidopsi, poaesses many ad- 
vantages for gene expression analysis, in- 
™ g * e fa « 'J«t it has the smallest 
genome of any- higher eukaryote examined 

cDNA, (Table 1). including 14 complete 
sequences and 31 expressed sequence tags 
(ESTs). were used as gene-specific target 
We obumed the ESTs by selecting cDNA 

rf?^«5? T - Stt ' uen « revealed 
that 28 * *e 31 ESTs matched sequences 
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between amUo acids eMandBSSol»ieAtt.f prod- 
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in Ae database (Table 1). Three additional 
cUNAs from other organisms served as con- 
trols in the experiments. 

The 48 cDNAs, averaging -1.0 kb 
were amplified with the polymerase chain 
reaction PCR) deposited ^ ™» 
v.dual wells of a 96-well microtiter plate. 
Each sample wa, duplicated in two adja- 
cent well, ,o allow the reproducibility of 
the arraytng and hybridiration process to 
be tested. Samples from the microti^ 
plate were printed onto glaa microscope 
slides m an area measuring 3.5 mm by 5^5 
mm w,th the use of a high-speed arraying 
machine (3) The arrays were processed S 
chemical and heat treatment to attach the 
UNA sequences to the glass surface and 
denature them (3). Three arrays, prints 
'n a „ngle lot. were used for the experi- 
ments here. A single microtiter plate of 
PCR products provides sufficient material 
to print at least 500 arrays. 

,nJi 1 T eS S nt Pr0beS ^ P«P«wd from 
total Arcbuhp sa mRNA (4) by a sing™ 

fX, f P^T ,ransc "P tion «)■ The Arc 
bubpzu m RNA wa, supplemented with hu. 
man acetylcholine receptor (AChR) mRNA 
« a d.lut.on of 1 : 1 0.000 (w/w) before cDNA 
syntheju, to provide an internal standard for 

^STlfLJ?- THe r " ulcin « fluorescently 
labeled cDNA mixture was hybridued ro an 
"ray at high stringency (6) and scanned 
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with a laser (3). A high-semitivity scan gave 
tignals that saturated the detector at nearly 
all ci the Araixdopsis target sites (Fig. 1A). 
Calibration relative to the AChR mRNA 
standard (Fig. 1A) established a sensitivity 
limit of -1 :50,000. No detectable hybridiza- 
tion was observed to either the rat glucocor- 
ticoid receptor (Rg. 1A) or the yeast TRP4 
(Pig. 1A) targeo even at the highest scan- 
ning sensitivity. A moderate-sensitivity scan 



of the same array allowed linear detection of 
the more abundant transcripts (Fig. IB). 
Quantitation of both scans revealed a range 
of expression levels spanning three orders of 
nagmrude for the 45 genes tested (Table 2). 
RNA blots (7) for several genes (fig. 2) 
corroborated the expression levels measured 
with the microanay to wichin a factor of 5 
(Table 2). 

Differential gene expression was investi. 



A High sensitivity d ... 

1 2 3 4 5 C 7 6 9 10 11 12 B , 2 , TT^T^ 



c C O 



1* «: O O 



h C* *w -i ; -. . o h 

a ., r , a | a — 1 mm 

»l 3.000 1:10.000 1:50.000 m-200 

Eipression level (iv/iv) 



g Q 0 -= c 

h a a * ■ 



□ 

1:10,000 



C Wild type 

t 2 3 4 5 C 7 6 9 10 11 12 

3 - : " O O 



O O " .r O Q o 



- Root Ussuc 

1 2 3 0 S 6 7 6 9 10 11 12 



D HAT4 transgenic 

12 30 s c r 6 9 10 1112 

b .• - -. - 



•f . : : o o 

g a o .•■ .* 



F Leal tissue 

' 2 3 4 5 6 7 6 9 10 11 12 



pted with a simultaneous, twtxobr hy- 
bridisation scheme, which served to mini- 
mi* experimental variation inherent in the 
comparison of independent hybriditatiocu. 
Fluorescent probes were prepared from two 
mRNA sources with the use of reverse tran- 
scriptase in the presence of fluorescein- and 
ussamine-labeled nucleotide analogs re- 
spectively (5). The two probes were 'then 
mixed together in equal proportions, hy- . 
bndued to a single array, and scanned sep- 
arately for fluorescein and lissamine em£ 
sion after independent excitation of the two 
fluorophores (3). 

To test whether overexpression of a sin- 
gle gene could be detected in a pool of total 
ArabulopsU mRNA. we used a micrcorray to 
analyze a transgenic line overexpressing the 
single transcription factor HAT* (8). Fluo- 
rescent probes representing mRNA from 
wiW-type and HAT^-transgenic plants were 
labeled with fluorescein and lissamine, re- 
spectively; the two probes were then mixed 
andhybridized to a single array. An intense 
hybridization iignal was observed at the 
position of the HAT* cDNA in the lissa- 
nune-specific scan (Fig. ID), but not in the 
Huorescein-specific scan of the same array 
{ Jf'} C)m CaIibrati <'n with AChR mRNA 
nti A 10 ^ C fluor *scein and lissamine 
cUNA synthesis reactions at dilutions of 
1:10,000 (Fig. 1C) and 1:100 (Fig. ID) 
respectively revealed a 50-fold elevation of 
HAT* mRNA in the transgenic line rela- 

^ C u, t0 ,\"^ Undance in wi W-rype planes 
(Table 2). This magnitude of HAT* over- 
expression matched that inferred from the 
Northern (RNA) analysis within a factor of 
2 (Fig. 2 and Table 2). Expression of alt the 
other genes monitored on the array differed 
by less than a factor of 5 berween HAT4- 
transgenic and wild-type plants (Fig 1. C 



O O o o. 



« » «?> o c* 



C 4 1 : 



1:10.000 



Fig. 1. Gene express** rnonrt^ 

pseudocolor correspond to hybridization irtanstfes^^^ represented in 

with tte use dknovvncoi^ 
(etterswtheaxwrnarktr^positiond 
with mx^sce^labeted cd£?c^^ 

mrjdMaserrtMy. (C and D) A angle array wasprofced^^ a? UrUtti rttntnt^l ^ ^ at 
from wid-type plants and issamine^ed cDNA^^ 2 f^fcerviabelod cONA 

and F) A single a/ray was probed wfth a 1:1 nXetffiuo^ "™ » * 

Sssarnm-labeJed cONA from leaf tissue. The single array^r^^^ ^ ?° X ^ "* 
fluorescein fluorescence correspencing to rnRU^L^^J^,^?^ successively to deled ihe 
ccm^sportfng to nV^^^^^ eXP ^ nrTOls « «* the fcssamine ftuorescence 
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VWdtyp. 



CAB/ 



HAT4 



BOC1 




Human 
AChR 



20 2J0 0.2 
mRNA (ng) 

Rg. 2. Gene expression monitored with RNA 
(Northern) blot analysis. Designated amoints of 
mRNA from wild -type and HAW-transgenic 
plants were spotted onto nylon membranes and 
probed with the cONAs indicated. Purified hunan 
AChR mRNA was used tor calibration. ■ 



- I r\ J <r li • * t ..... 



and D, and Tabic 2). Hybridimion of flu- 
orescein-labeled glucocorticoid receptor 
cDNA (Rg. IC) and lissamine-Iabeled 
TRP4 cDNA (Fig. ID) verified the pres. 
ence of the negative control Urged and the 
lack of optical crow talk between the two 
fluorophorti. 

To explore a more complex alteration in 
expression patterns, we performed a second 
two-color hybridiiation experiment with 
fluorescein- and lissamine-Iabeled probes 
prepared from root and leaf mRNA, respec- 
tively. The scanning sensitivities for the 
two fluorophores were normalized by 
matching the signals resulting from AChR 



niRNA, which was added to both cDNA 
*£UhesU reactions at a dilution of 1:1000 
^ig. 1 ,E and F). A comparison of the scans 
revealed widespread differences in gene ex- 
pression between root and leaf tissue (Fie. 1 

iated CAB/ gene was -500-fold more abun- 
dant in leaf (Fig. IF) than in root tissue 
(Hg. IE) The expression of 26 other genes 
dtnered between root and leaf tissue by 

™* ^ n » ° f 5 (Fi * l > E and F). 

1 he HAT^-transgenic line we examined 
has elongated hypocotyls, early flowering, 
poor germination, and altered pigmentation 
W- Although changes in expression were 



^ - K^^ the ^ or potato 

in thisaudy matched a sequence in the ^^^T^Ll «* the EST* used 

**deoti*, : ATPase, ad^sine trtPhosprgg^ nicotharrtde adenine 



Position 


cDNA 


81.2 


AChR 


a3,4 


EST3 


a5.6 


EST6 


a7,8 


AAC1 


a9, 10 


EST12 


all. 12 


EST13 


bl.2 


CAB} 


63.4 


EST17 


D5.6 


GM 


D7.8 


EST19 


D9. 10 


GBf-1 


611. 12 


EST23 


C1.2 


ES729 


C3,4 


GBF-2 


c5,6 


ES734 


C7.8 


EST3S 


c9, 10 


EST41 


cn.12 


rGR 


d1.2 


EST42 


63,4 


EST45 


d5, 6 


HATl 


d7.8 


EST46 


d9, 10 


EST49 


d11. 12 


HAT2 


el. 2 


HAT 4 


e3,4 


EST50 


eS.6 


HATS 


67.8 


ES751 


e9. 10 


HA 722 


e11. 12 


EST52 



Function 



Human AChR 
Actio 

NADH ctehyoVooenase 

Actio 1 

Unknown 

Actio 

Chlorophyll a/b bindhg 
PhosphogJycerate kinase 
Gbbereflic acid bbsynthesis 
Unknown 

G-box binding factor 1 
Elongation factor 
Afdotase 

G-box binding factor 2 
Chioroplast protease 
Unknown 



number 



«3.4 

15,6 

(7.8 

f9. 10 

111.12 

91.2 

93.4 

9^.6 

97.8 

99.10 

911. 12 

M.2 

h3.4 

h5.6 

h7.8 

h9.10 

M1.12 



Rat glucocorticoid receptor 
Unknown 
ATPase 

HcfTieobca-leucine zipper i 
Light harvesting complex 
Unknown 

Momeobox-leucine zfcper 2 
Hwneobox-leocine zipper 4 
Phosprxxibuto*onase 
Ho^eobox-ieuctne zipper 5 
Unknown 

Homeobox-leucine zjpper 22 
Oxygen evolving 
Unknown 

Koorred-fike homeobox 1 
RuBisCO smafl subunit 
Translation elongation factor 
Proteh phosphatase 1 
Unknown 

ChJoropiast protease 
Lhknown 
Cydophffin 
GTP binding 
Unknown 
Unknown 
Unknown 
Unknown 
Synaptc>Ctfevin 
Ught harvesting complex 
Light harvesting complex 

Yeast trypto phan biosynthesis 

of StraWgana (U Jda. Ctf itomot). 



EST59 

KNATl 

EST60 

EST69 

PPH1 

EST70 

EST75 

EST 78 

AOC7 

EST82 

EST83 

EST84 

EST91 

EST96 

SAR1 

EST 100 

EST103 

TRP4 



H36236 

227010 

M20016 

U36594T 

T45783 

M65150 

T44490 

L37126 

U35S96t 

X63894 

X52256 

T04477 

X63895 

R87034 

T14152 

T22720 

Ml 4053 

U36596t 

J04185 

U09332 

T04O63 

T76267 

U09335 

M90394 

T04344 

M90416 

233675 

U09336 

T21749 

234607 

U14174 

X14564 

T42799 

U34803 

T44621 

T43698 

R65481 

L14844 

X59152 

233795 

T45276 

T13832 

R64816 

M90418 

218205 

X03909 

X04273 



observed tor HAT4, large changes in ex. 
mm w«t not observed for any of the 
other 44 gen« we examined. Thij w£ 
somewhat surprising, particularfr because 
comr^rative arudvji, of leaf and root tissue 
identified 27 differentially expressed genes 
Analyw of an expanded set of genes may be 
required to identify genes whose expression 
changes upon HAT4 overexpression; alter- 
natively, a cornparison of mRNA popula- 
tions from specific tissues of wild-type and 
rtAT4-trarugenic plants may allow identi- 
fication of ctownsaeam genes. 

At the current density of robotic printing, 
it is feasible to scale up the fabrication pZ 
C ^,J° P«mucc axrayi coriraining 20,000 
^^rgea. At this demit^T^Ic anay 
would be sufTicient to provide gene^specific 
targets encompassing nearly the entire rep- 
ertoire of expressed genes in the AmfcWj 
genome (2). The availability of 20,274 ESTs 
from Arabtdopsis (I, 9) would provide a rich 
source of templates for such studies 

The estimated 100,000 genes in* the hu- 
man genome (10) exceeds the number of 
Arabtdopsis genes by a factor of 5 (2). This 
modest increase in complexity suggests that 
similar cDNA microarray,, prepared from 

£4r?n? y tPenoirc of human 

tSTs (/), could be used to determine the 
expression patterns of tens of thousands of 
human genes in diverse cell types. Coupling 
an amplification strategy to the reverse 
transcriprion reaction (J I) could make it 
feasible to monitor expression even in 
minute tissue samples. A wide variety of 
acute and chronic physiological and patho- 
logical conditions might lead to character, 
atic changes in the patterns of gene expres- 
sion in peripheral blood cells or other easily 
sampled tissues. In concert with cDNA mi- 
croarrays for monitoring complex expres- 
sion patterns, these tissues might therefore 
serve as sensitive in vivo sensors for clinical 
diagnosis. Microarrays of cDNAs could thus 
provide a useful link between human gene 
sequences and clinical medicine. 



Table 2. Gene expression morwtorrabyfrtcroar. 
ray and RNA btot analyses; tg. IhauZ^^ 
See Table 1 for additional gene ntorrnatoT^- 

of known amounts of human AChR mRNA. Valuer 
or the microarray were examined from rrtcroa^ 
ray scans (Rg. 1); values for the RNA btotweTe 
determined from RNA blots (fig 2) 



Gene 



Expressio n level (w/w) 
Microarray RNA blot 
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CAB 
CAB! Og] 
HAT 4 
HAT 4 (tg) 
ROC1 
*OC* (tg) 



1:43 

1:120 

1:8300 

1:150 

1:1200 

1:260 



1:B3 

1:150 

1:6300 

1:210 

1:1800 

1:1300 
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REFERENCES AND NOTES 

1. The airani EST database ftib€ST refease 09149S 
^/Sl!!^^"? -2?"^ ^ Biotachnotagy Wom». 
ton (Btfhascta. MO) centals a total of 322*25 on- 
haWhg 255.645 ton the hunan oenome 

W^wna^u^efat.rtanr^ 1,367 (ia£ 
P. Jan* af at. AM Aft* &* 24. 666 0»4); L U 
Guanef at, A4* Gen Genat 245, 390(18941 

3. O.Shaten,ihew,St«rrfordUv wity ( 19Q5 j^ 

f*f-, a . Brown. In prapamflon. Mooanayi wen 

t^iMmonep*^tlp.7hetoloaded MoTPCfl 
Progjie mpyrnQ torn 96-wai mopttor 

and depostad H)A16 fj par cue on 40 site at a 

2^*5»»"n. The printed sio^ererenydna- 
•dtor 2 hui fn a humfd ctambar. anap^jriad at 
™^i?J ™** h 0.1% SOS, and treated 
anhycrida prepared h buff? 

^oreiTO. Mcroamiyx wars warmed w*h Tlaaar 
• computer 

p^ nK ^tatar alowBd aeouentlal excitation of 
the two borophoras. Emitted ight was apft accord- 

o* » anabgHCHOlgttal board. AoWkxtaJoaMfe 
»n«ofe^fpbrownOcmgr^ 

^W^4l^ W ™^ N - 

TuSSfS^'Sl^ Ptrataoar^TnSSS 
" »iJ reactions contained CL1 nn/ui of 

^J^mWM, 0.1 ng/^j of hunan aSw 

•J**""**. 0.03 U/,J of ifaonuctease bfock, 500 

OfjWnosh. triphosphate. 500 nM dTTP, 40 
MWdjcxyc^^ 
oj^^P (or fcsaamirie-SKCTPj. and 0.03 
U'fJ of StraoScript reverse transolptasa Reactiom 

1 "** STA PH 8.0). Samptes wore than 
nB3ledte3rraiat04^aridGhiledoniet.Th»Rf4A 
w« dagradad by addhg 025 ,J of 10 N n£h 
«owad by a lO-m* hcubetton at 37*C. The tam- 
Ptes nautrafaad by addition of Z5 ,»] oM M 
«^*Hao» and QJ2& M of 10 N HO and pracb- 

etfwn* oned to cornpWon in a cpeedvae. rasu*. 
pano*djn 10 ^ of H.O. and reduced to 3.0 Jha 
apea^awresce^^ 

cwaj^hasepnsrJuc!©^ 
cutter (lOx same cooun cfeate (SSQ and 02% 

«5 n*™™***** end ooverad with cow stas 
^^J^^^^^nrta^toahy^ 
baton Camber (3) and hcubatad for 18 hours at 

« « i53?r?l e % * ShBd ** 5 * wo™ <"Wr- 

^S^Vwcy wash buffer (Q.i x SSC andai% 

rPlt ^I!?**" 1 """"^ h ai x SSC wftn the use 
of a feorasora l»Mr^canning devicaP). 

• mRNA K 5) were spotted onto 
IMu *-T> and crossSnkad with ii 



9 - 2; ^1*^"^ <J051 (1995; T. 
,„ *'™°"*))«ot 106. 1241 (1994) 

2£ 1966 d 991); C. 
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Gene Therapy in Peripheral Blood 
Lymphocytes and Bone Marrow for 
ADA Immunodeficient Patients 

Alberto G. Ugazio, Fulvio MaviHo 

used to transfer ex vivo the h^^TJ^^Z^ mereni relr oviral vectors were 
blood lymphocytes from two pXte Ce " S and Spheral 

apy. After 2 years of treatrr^nUo^e^TuH oVT/nT^^ ther- 
and granulocytes expressing .he trsuisferredADA «nf ^Phocytes, marrow cells. 
hn«^iizationoftheimm U ^repe^.re a n^^^ ^atteo 1 
After discontinuation of treatment TlwmShS^- f "» ular aid humoral immunity, 
blood lymphocytes, were W^ZrZ^T^ZT T- ,ransduced ^PherX 
tients. These results indicate Lcces^l o^e ,«„«L TO T°r denved T 06,18 in »»* pa- 
producing a functional mm^Sm^ '«i9-lasting progenitor cells. 



Severe combined immunodeficiency asso- 
ejated with inherited deficiency of ADA 
(i) i» usually fatal uniess 
are kept in protective isolation or the im- 
mune *yttem is reconstituted by bone mar- 
row plantation from a human leuko- 
cjteantujen (HLAMdentical sibling donor 
V m ^^PV °f choice, although 
t o avatlable only for a minority of patienL 
£ recent years other forms of therapy have 

ht*. ^ induding "»«P««ni. from 
™piowenticaldonors(3,4) 

, exogenous en- 
iyme replacement (5), and somatic-cell 
gene therapy (6-9). 

e\ a P^lixxical mod. 

el in which ADA gene transfer and expression 



tr^riZ JZ ^T" 1 800 crtJssftnkad with u»- 
^ ^ "» of a StratainKer 1800 
(Stratagane). Probes were prapamd by rancSm 
pnrnnp with the use of a Prirr»«-ft«kietratagane)h 

rted out aocorolng to the hstrualons the rrvanu- 
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successfully restored immune function, in hu- 
man ADA-dcficient [ADA") peripheral 
Wood lymphocytes (PEL,) in unmZdefi- 
cient mice in vivo (10, llj. On the basi, of 
these preclinical results, the clinical applica- 

ADA°- £Tn ^ ( \ Ae °™ «f 
alia 5C1D (severe combined immunodefi- 

cency disease) patients who previously failed 
exogenous cruyme replacement therapy was 
approved by our Institutional Ethical Coro- 
mittees and by the Italian National Commit- 
tee for B.oeth,cs (12). In addition to evaluat- 
«^e safety and efficacy of the gene therari 
procedure, the a lm of the study was to define 
the relanve role of PBLs and hematopoietic 
stem cells in the long-term recor^riS 
rnimune functions after retroviral vector-me- 
d»ted ADA gene transfer, hr this purp ^ 
two structurally Klentical vectors exprSS 

(cDNA), dtstmguishable by the presence of 
alternative restriction sites in a nonfunctional 
regran of the viral long-terminal repeat 
'' *« c >«d to transduce PBLs and bone 
«>*r°w BM) celb independently. This pro- 
cedure allowed identification of the origi£ of 
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METHOD AMP APPARATUS FOR TABRICA^ tw* 
MIOtQARRATfl OP BIOLOGICAJ - awlm 

Field of the Invention 

5 This invention relates to a method and apparatus 

for fabricating microarrays of biological samples for 
large scale screening assays, such as arrays of DNA 
samples to be used in DNA hybridization assays for 
genetic research and diagnostic applications. 
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Background of th «i Tnvantimi 

A variety of methods are currently available for 
making arrays of biological macromolecules , such as 
arrays of nucleic acid molecules or proteins, one 
method for making ordered arrays of DNA on a porous 
membrane is a -dot blot- approach. In this method, a 
vacuum manifold transfers a plurality, e.g., 96, 
aqueous samples of DNA from 3 millimeter diameter wells 
15 to a porous membrane. A common variant of this 

procedure is a "slot-blot- method in which the wells 
have highly-elongated oval shapes. 

The DNA is immobilized on the porous membrane by 
baking the membrane or exposing it to UV radiation. 
20 This is a manual procedure practical for making one 
array at a time and usually limited to 96 samples per 
array. "Dot-blot" procedures are therefore inadequate 
for applications in which many thousand samples must be 
determined. 

25 A nore efficient technique employed for making 

ordered arrays of genomic fragments uses an array of 
pins dipped into the wells, e.g., the 96 wells of a 
microtitre plate, for transferring an array of samples 
to a substrate, such as a porous membrane, one array 

30 includes pins that are designed to spot a membrane in a 
staggered fashion, for creating an array of 9216 spots 
in a 22 x 22 cm area (Lehrach, et al., 1990) . a 
limitation with this approach is that the volume of DNA 
spotted in each pixel of each array is highly variable. 
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In additi n r the number of arrays that can be made with 
each dipping is usually quite small. 

An alternate method of creating ordered arrays of 
nucleic acid sequences is described by Pirrung, et al. 
5 (1992), and also by Fodor, et al. (1991). The method 
involves synthesizing different nucleic acid sequences 
at different discrete regions of a support. This 
method employs elaborate synthetic schemes, and is 
generally limited to relatively short nucleic acid 

10 sample, e.g., less than 20 bases. A related method has 
been described by Southern, et al. (1992). 

Khrapko, et al. (1991) describes a method of 
making an oligonucleotide matrix by spotting DNA onto a 
thin layer of polyacrylamide. The spotting is done 

15 manually with a micropipette. 

None of the methods or devices described in the 
prior art are designed for mass fabrication of 
microarrays characterized by (i) a large number of 
micro-sized assay regions separated by a distance of 

20 50-200 microns or less, and (ii) a well-defined amount, 
typically in the picomole range, of analyte associated 
with each region of the array. 

Furthermore, current technology is directed at 
performing such assays one at a time to a single array 

25 of DNA molecules. For example, the most common method 
for performing DNA hybridizations to arrays spotted 
onto porous membrane involves sealing the membrane in a 
plastic bag ( Mania t as, et al., 1989) or a rotating 
glass cylinder (Robbins Scientific) with the labeled 

30 hybridization probe inside the sealed chamber. For 
arrays made on non-porous surfaces, such as a 
microscope slide, each array is incubated with the 
labeled hybridization probe sealed under a covers lip. 
These techniques require a separate sealed chamber for 
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many such arrays inconvenient and time intensive. 

Abouzied, et al. (1994) describes a method of 
printing horizontal lines of antibodies on a 
5 nitrocellulose membrane and separating regions of the 
membrane with vertical stripes of a hydrophobic 
material. Each vertical stripe is then reacted with a 
different antigen and the reaction between the 
immobilized antibody and an antigen is detected using a 
10 standard ELISA color imetric technique. Abouzied 's 
technique makes it possible to screen many one- 
dimensional arrays simultaneously on a single sheet of 
nitrocellulose. Abouzied makes the nitrocellulose 
somewhat hydrophobic using a line drawn with PAP Pen 
15 (Research Products International) . However Abouzied 
does not describe a technology that is capable of 
completely sealing the pores of the nitrocellulose. The 
pores of the nitrocellulose are still physically open 
and so the assay reagents can leak through the 
20 hydrophobic barrier during extended high temperature 
incubations or in the presence of detergents which 
makes the Abouzied technique unacceptable for DNA 
hybridization assays. 

Porous membranes with printed patterns of 
25 hydrophilic/hydrophobic regions exist for applications 
such as ordered arrays of bacteria colonies. QA Life 
Sciences (San Diego CA) makes such a membrane with a 
grid pattern printed on it. However, this membrane has 
the same disadvantage as the Abouzied technique since 
reagents can still flow between the gridded arrays 
making them unusable for separate DNA hybridization 
assays . 

Pall Corporation make a 96-well plate with a 
porous filter heat sealed to the bottom of the plate. 
35 These plates are capable of containing different 
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reagents in each well without cross-contamination. 
However, each well is intended to hold only one target 
element whereas the invention described here makes a 
microarray of many biomolecules in each subdivided 
region of the solid support. Furthermore, the 96 well 
plates are at least l cm thick and prevent the use of 
the device for many color imetric, fluorescent and 
radioactive detection formats which require that the 
membrane lie flat against the detection surface. The 
invention described here requires no further processing 
after the assay step since the barriers elements are 
shallow and do not interfere with the detection step 
thereby greatly increasing convenience. 

Hyseq Corporation has described a method of making 
an -array of arrays * on a non-porous solid support for 
use with their sequencing by hybridization technique. 
The method described by Hyseq involves modifying the 
chemistry of the solid support material to form a 
hydrophobic grid pattern where each subdivided region 
contains a microarray of biomolecules. Hyseq 's flat 
hydrophobic pattern does not make use of physical 
blocking as an additional means of preventing cross 
contamination • 



25 Rtiww»^Y e f the Invention 

The invention includes, in one aspect, a method of 
forming a microarray of analyte-assay regions on a 
solid support, where each region in the array has a 
known amount of a selected, analyte-specif ic reagent. 

30 The method involves first loading a solution of a 
selected analyte-specif ic reagent in a reagent- 
dispensing device having an elongate capillary channel 
(i) formed by spaced-apart, coextensive elongate 
members, (ii) adapted to hold a quantity of the reagent 

35 solution and (iii) having a tip region at which aqueous 
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solution in the channel forms a meniscus. The channel 
is preferably formed by a pair of spaced-apart tapered 
elements. 

The tip of the dispensing device is tapped against 
a solid support at a defined position on the support 
surface with an impulse effective to break the meniscus 
in the capillary channel deposit a selected volume of 
solution on the surface, preferably a selected volume 
in the range 0.01 to 100 nl. The two steps are 
repeated until the desired array is formed. 

The method may be practiced in forming a plurality 
of such arrays, where the solution-depositing step is 
are applied to a selected position on each of a 
plurality of solid supports at each repeat cycle. 

The dispensing device may be loaded with a new 
solution, by the steps of (i) dipping the capillary 
channel of the device in a wash solution, (ii) removing 
wash solution drawn into the capillary channel, and 
(iii) dipping the capillary channel into the new 
reagent solution. 

Also included in the invention is an automated 
apparatus for forming a microarray of analyte-assay 
regions on a plurality of solid supports, where each 
region in the array has a known amount of a selected, 
analyte-specific reagent. The apparatus has a holder 
for holding, at known positions, a plurality of planar 
supports, and a reagent dispensing device of the type 
described above. 

The apparatus further includes positioning 
structure for positioning the dispensing device at a 
selected array position with respect to a support in 
said holder, and dispensing structure for moving the 
dispensing device into tapping engagement against a 
support with a selected impulse effective to deposit a 
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selected volume n the support, e.g., a selected volume 
in the volume range 0.01 to 100 nl. 

The positioning and dispensing structures are 
controlled by a control unit in the apparatus. The 
5 unit operates to (i) place the dispensing device at a 
loading station, (ii) move the capillary channel in the 
device into a selected reagent at the loading station, 
to load the dispensing device with the reagent , and 
(iii) dispense the reagent at a defined array position 
10 on each of the supports on said holder. The unit may 
further operate , at the end of a dispensing cycle, to 
wash the dispensing device by (i) placing the 
dispensing device at a washing station, (ii) moving the 
capillary channel in the device into a wash fluid, to 
15 load the dispensing device with the fluid, and (iii) 
remove the wash fluid prior to loading the dispensing 
device with a fresh selected reagent. 

The dispensing device in the apparatus may be one 
of a plurality of such devices which are carried on the 
20 arm for dispensing different analyte assay reagents at 
selected spaced array positions. 

In another aspect, the invention includes a 
substrate with a surface having a microarray of at 
least 10 3 distinct polynucleotide or polypeptide 
25 biopolymers in a surface area of less than about 1 cm 2 . 
Each distinct biopolymer (i) is disposed at a separate, 
defined position in said array, (ii) has a length of at 
least 50 subunits, and (iii) is present in a defined 
amount between about 0.1 femtomoles and 100 nanomoles. 
30 In one embodiment, the surface is glass slide 

surface coated with a polycationic polymer, such as 
poly lysine, and the biopolymers are polynucleotides. 
In another embodiment, the substrate has a water- 
impermeable backing, a water-permeable film formed on 
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the backing, and a grid formed on the film. The grid 
is composed of intersecting water-impervious grid 
elements extending from said backing to positions 
raised above the surface of said film, and partitions 
the film into a plurality of water-impervious cells. A 
biopolymer array is formed within each well. 

More generally, there is provided a substrate for 
use in detecting binding of labeled polynucleotides to 
one or more of a plurality different-sequence, 
immobilized polynucleotides. The substrate includes, 
in one aspect, a glass support, a coating of a 
polycationic polymer, such as poly lysine, on said 
surface of the support, and an array of distinct 
polynucleotides electrostatically bound non-covalently 
to said coating, where each distinct biopolymer is 
disposed at a separate, defined position in a surface 
array of polynucleotides. 

In another aspect, the substrate includes a water- 
impermeable backing, a water-permeable film formed on 
the backing, and a grid formed on the film, where the 
grid is composed of intersecting water-impervious grid 
elements extending from the backing to positions raised 
above the surface of the film, forming a plurality of 
cells, a biopolymer array is formed within each cell. 

Also forming part of the invention is a method of 
detecting differential expression of each of a 
plurality of genes in a first cell type, with respect 
to expression of the same genes in a second cell type. 
In practicing the method, there is first produced 
30 fluorescent-labeled cDNA's from mRNA's isolated from 
the two cells types, where the cDNA'S from the first 
and second cells are labeled with first and second 
different fluorescent reporters. 

A mixture of the labeled cDNA's from the two cell 
35 types is added to an array of polynucleotides 
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representing a plurality of known genes derived from 
the two cell types, under conditions that result in 
hybridization of the cDNA's to complementary-sequence 
polynucleotides in the array. The array is then 
5 examined by fluorescence under fluorescence excitation 
conditions in which (i) polynucleotides in the array 
that are hybridized predominantly to cDNA's derived 
from one of the first and second cell types give a 
distinct first or second fluorescence emission color, 

10 respectively, and (ii) polynucleotides in the array 

that are hybridized to substantially equal numbers of 
cDNA's derived from the first and second cell types 
give a distinct combined fluorescence emission color, 
respectively. The relative expression of known genes 

15 in the two cell types can then be determined by the 
observed fluorescence emission color of each spot. 

These and other objects and features of the 
invention will become more fully apparent when the 
following detailed description of the invention is read 

20 in conjunction with the accompanying figures. 



Brief Description of the Drawings 
Fig. 1 is a side view of a reagent-dispensing 
device having a open-capillary dispensing head 
25 constructed for use in one embodiment of the invention; 

Figs. 2A-2C illustrate steps in the delivery of a 
f ixed-volume bead on a hydrophobic surface employing 
the dispensing head from Fig. l, in accordance with one 
embodiment of the method of the invention; 
30 Fig. 3 shows a portion of a two-dimensional array 

of analyte-assay regions constructed according to the 
method of the invention; 

Fig. 4 is a planar view showing components of an 
automated apparatus for forming arrays in accordance 
35 with the invent! n. 
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Fig. 5 shows a fluorescent image of an actual 20 x 
20 array of 400 f luorescently-labeled DNA samples 
immobilized on a poly-l-lysine coated slide / where the 
total area covered by the 400 element array is 16 
5 square millimeters; 

Fig. 6 is a fluorescent image ofal.8cmxi.8cm 
microarray containing lambda clones with yeast inserts, 
the fluorescent signal arising from the hybridization 
to the array with approximately half the yeast genome 

10 labeled with a green f luorophore and the other half 
with a red f luorophore; 

Fig. 7 shows the translation of the hybridization 
image of Fig. 6 into a karyotype of the yeast genome, 
where the elements of Fig. -6 microarray contain yeast 

15 DNA sequences that have been previously physically 
mapped in the yeast genome; 

Fig. 8 show a fluorescent image of a 0.5 cmx 0.5 
cm microarray of 24 cDNA clones, where the microarray 
was hybridized simultaneously with total cDNA from wild 

20 type AraJbidopsis plant labeled with a green f luorophore 
and total cDNA from a transgenic Arabidopsis plant 
labeled with a red f luorophore, and the arrow points to 
the cDNA clone representing the gene introduced into 
the transgenic AraJbidopsis plant; 

25 Fig. 9 shows a plan view of substrate having an 

array of cells formed by barrier elements in the form 
of a grid; 

Fig. 10 shows an enlarged plan view of one of the 
cells in the substrate in Fig. 9, showing an array of 
30 polynucleotide regions in the cell; 

Fig. 11 is an enlarged sectional view of the 
substrate in Fig. 9, taken along a section line in that 
figure; and 

Fig. 12 is a scanned image of a 3 cm x 3 cm 
35 nitr cellulose solid support containing four identical 



WO 95/35505 



PCTAJS95/07659 



11 

arrays f M13 clones in each of four quadrants, where 
each quadrant was hybridized simultaneously to a 
different oligonucleotide using an open face 
hybridization method, 

5 

Detailed Description of the Invention 

I. Definitions 

Unless indicated otherwise, the terms defined 
below have the following meanings: 

10 "Ligand" refers to one member of a ligand/ ant i- 

ligand binding pair. The ligand may be, for example, 
one of the nucleic acid strands in a complementary, 
hybridized nucleic acid duplex binding pair; an 
effector molecule in an effector /receptor binding pair; 

15 or an antigen in an antigen/ antibody or 
antigen/ antibody fragment binding pair. 

"Antiligand" refers to the opposite member of a 
ligand/anti-ligand binding pair. The antiligand may be 
the other of the nucleic acid strands in a 

20 complementary, hybridized nucleic acid duplex binding 
pair; the receptor molecule in an effector /receptor 
binding pair; or an antibody or antibody fragment 
molecule in antigen/ antibody or antigen/antibody 
fragment binding pair, respectively. 

25 "Analyte" or "analyte molecule" refers to a 

molecule, typically a macromolecule, such as a 
polynucleotide or polypeptide, whose presence, amount, 
and/or identity are to be determined. The analyte is 
one member of a ligand/anti-ligand pair. 

30 "Analyte-specif ic assay reagent" refers to a 

molecule effective to bind specifically to an analyte 
molecule. The reagent is the opposite member of a 
ligand/anti-ligand binding pair. 

An "array of regions on a solid support" is a 

35 linear r two-dimensional array of preferably discrete 
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regions, each having a finite area, formed on the 
surface of a solid support. 

A "microarray" is an array of regions having a 
density of discrete regions of at least about 100/cm 3 , 
and preferably at least about 1000/cm J . The regions in 
a microarray have typical dimensions, e.g., diameters, 
in the range of between about 10-250 jim, and are 
separated from other regions in the array by about the 
same distance. 

A support surface is "hydrophobic" if a aqueous- 
medium droplet applied to the surface does not spread 
out substantially beyond the area size of the applied 
droplet. That is, the surface acts to prevent 
spreading of the droplet applied to the surface by 
hydrophobic interaction with the droplet. 

A "meniscus" means a concave or convex surface 
that forms on the bottom of a liquid in a channel as a 
result of the surface tension of the liquid. 

"Distinct biopolymers", as applied to the 
biopolymers forming a microarray, means an array member 
which is distinct from other array members on the basis 
of a different biopolymer sequence, and/or different 
concentrations of the same or distinct biopolymers, 
and/or different mixtures of distinct or different- 
concentration biopolymers. Thus an array of "distinct 
polynucleotides" means an array containing, as its 
members, (i) distinct polynucleotides, which may have a 
defined amount in each member, (ii) different, graded 
concentrations of given-sequence polynucleotides, 
and/or (iii) different-composition mixtures of two or 
more distinct polynucleotides. 

"Cell type" means a cell from a given source, 
e.g., a tissue, or organ, or a cell in a given state of 
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differentiation, or a cell associated with a given 
pathology or genetic makeup. 

II- Method of Microarrav Formation 
5 This section describes a method of forming a 

microarray of analyte-assay regions on a solid support 
or substrate, where each region in the array has a 
known amount of a selected, analyte-specif ic reagent. 

Fig. 1 illustrates, in a partially schematic view, 

10 a reagent-dispensing device 10 useful in practicing the 
method. The device generally includes a reagent 
dispenser 12 having an elongate open capillary channel 
14 adapted to hold a quantity of the reagent solution, 
such as indicated at 16, as will be described below. 

15 The capillary channel is formed by a pair of spaced- 

apart, coextensive, elongate members 12a, 12b which are 
tapered toward one another and converge at a tip or tip 
region 18 at the lower end of the channel. More 
generally, the open channel is formed by at least two 

20 elongate, spaced-apart members adapted to hold a 

quantity of reagent solutions and having a tip region 
at which aqueous solution in the channel forms a 
meniscus, such as the concave meniscus illustrated at 
20 in Fig. 2A. The advantages of the open channel 

25 construction of the dispenser are discussed below. 

With continued reference to Fig. 1, the dispenser 
device also includes structure for moving the dispenser 
rapidly toward and away from a support surface, for 
effecting deposition of a known amount of solution in 

30 the dispenser on a support, as will be described below 
with reference to Figs. 2A-2C. In the embodiment 
shown, this structure includes a solenoid 22 which is 
activatable to draw a solenoid piston 24 rapidly 
downwardly, then release the piston, e.g., under spring 

35 bias, to a normal, raised position, as shown. The 
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dispenser is carried on the piston by a connecting 
member 26, as shown. The just-described moving 
structure is also referred to herein as dispensing 
means for moving the dispenser into engagement with a 
5 solid support, for dispensing a known volume of fluid 
on the support. 

The dispensing device just described is carried on 
an arm 28 that may be moved either linearly or in an x- 
y plane to position the dispenser at a selected 
10 deposition position , as will be described. 

Figs. 2A-2C illustrate the method of depositing a 
known amount of reagent solution in the just-described 
dispenser on the surface of a solid support , such as 
the support indicated at 30. The support is a polymer, 
glass, or other solid-material support having a surface 
indicated at 31. 

In one general embodiment, the surface is a 
relatively hydrophilic, i.e., wettable surface, such as 
a surface having native, bound or covalently attached 
20 charged groups. On such surface described below is a 
glass surface having an absorbed layer of a 
polycationic polymer, such as poly-l-lysine. 

In another embodiment, the surface has or is 
formed to have a relatively hydrophobic character, 
25 i.e., one that causes aqueous medium deposited on the 
surface to bead. A variety of known hydrophobic 
polymers, such as polystyrene, polypropylene, or 
polyethylene have desired hydrophobic properties, as do 
glass and a variety of lubricant or other hydrophobic 
films that may be applied, to the support surface. 

Initially, the dispenser is loaded with a selected 
analyte-specif ic reagent solution, such as by dipping 
the dispenser tip, after washing, into a solution of 
the reagent, and allowing filling by capillary flow 
35 into the dispenser channel. The dispenser is now moved 
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to a selected position with respect to a support 
surface, placing the dispenser tip directly above the 
support-surface position at which the reagent is to be 
deposited. This movement takes place with the 
5 dispenser tip in its raised position, as seen in Fig. 
2A, where the tip is typically at least several 1-5 mm 
above the surface of the substrate. 

With the dispenser so positioned, solenoid 22 is 
now activated to cause the dispenser tip to move 
10 rapidly toward and away from the substrate surface, 
making momentary contact with the surface, in effect, 
tapping the tip of the dispenser against the support 
surface. The tapping movement of the tip against the 
surface acts to break the liquid meniscus in the tip 
15 channel, bringing the liquid in the tip into contact 
with the support surface. This, in turn, produces a 
flowing of the liquid into the capillary space between 
the tip and the surface, acting to draw liquid out of 
the dispenser channel, as seen in Pig. 2B. 
20 Pig. 2C shows flow of fluid from the tip onto the 

support surface, which in this case is a hydrophobic 
surface. The figure illustrates that liquid continues 
to flow from the dispenser onto the support surface 
until it forms a liquid bead 32. At a given bead size, 
25 i.e., volume, the tendency of liquid to flow onto the 
surface will be balanced by the hydrophobic surface 
interaction of the bead with the support surface, which 
acts to limit the total bead area on the surface, and 
by the surface tension of the droplet, which tends 
30 toward a given bead curvature. At this point, a given 
bead volume will have formed, and continued contact of 
the dispenser tip with the bead, as the dispenser tip 
is being withdrawn, will have little or no effect on 
bead volume. 



WO 95/35505 



PCT/US95/07659 



16 _ 

For liquid-dispensing on a more hydrophilic 
surface, the liquid will have less of a tendency to 
bead, and the dispensed volume will be more sensitive 
to the total dwell time of the dispenser tip in the 
5 immediate vicinity of the support surface, e.g., the 
positions illustrated in Figs. 2B and 2C. 

The desired deposition volume, i.e., bead volume, 
formed by this method is preferably in the range 2 pi 
(picoliters) to 2 nl (nanoliters) , although volumes as 
10 high as 100 nl or more may be dispensed. It will be 
appreciated that the selected dispensed volume will 
depend on (i) the "footprint" of the dispenser tip, 
i.e., the size of the area spanned by the tip, (ii) the 
hydrophobicity of the support surface, and (iii) the 
15 time of contact with and rate of withdrawal of the tip 
from the support surface. In addition, bead size may 
be reduced by increasing the viscosity of the medium, 
effectively reducing the flow time of liquid from the 
dispenser onto the support surface. The drop size may 
20 be further constrained by depositing the drop in a 
hydrophilic region surrounded by a hydrophobic grid 
pattern on the support surface. 

In a typical embodiment, the dispenser tip is 
tapped rapidly against the support surface, with a 
25 total residence time in contact with the support of 
less than about 1 msec, and a rate of upward travel 
from the surface of about 10 cm/sec. 

Assuming that the bead that forms on contact with 
the surface is a hemispherical bead, with a diameter 
30 approximately equal to the width of the dispenser tip, 
as shown in Fig. 2C, the volume of the bead formed in 
relation to dispenser tip width (d) is given in Table l 
below. As seen, the volume of the bead ranges between 
2 pi to 2 nl as the width size is increased from about 
35 20 t 200 jm. 
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Tabl 1 



d 


Volume (nl) 


20 pm 


2 x 10" 3 


50 pm 


3.1 x lO' 2 


loo Jim 


2.5 x 10" 1 


200 Mm 





10 At a given tip size, bead volume can be reduced in 

a controlled fashion by increasing surface 
hydrophobicity, reducing time of contact of the tip 
with the surface, increasing rate of movement of the 
tip away from the surface, and/ or increasing the 

15 viscosity of the medium. Once these parameters are 

fixed, a selected deposition volume in the desired pi 
to nl range can be achieved in a repeatable fashion. 

After depositing a bead at one selected location 
on a support, the tip is typically moved to a 

20 corresponding position on a second support, a droplet 
is deposited at that position, and this process is 
repeated until a liquid droplet of the reagent has been 
deposited at a selected position on each of a plurality 
of supports, 

25 The tip is then washed to remove the reagent 

liquid, filled with another reagent liquid and this 
reagent is now deposited at each another array position 
on each of the supports. In one embodiment, the tip is 
washed and refilled by the steps of (i) dipping the 

30 capillary channel of the device in a wash solution, 
(ii) removing wash solution drawn into the capillary 
channel, and (iii) dipping the capillary channel into 
the new reagent solution. 

From the foregoing, it will be appreciated that 

35 the tweezers-like, open-capillary dispenser tip 
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provides the advantages that (i) the open channel of 
the tip facilitates rapid, efficient washing and drying 
before reloading the tip with a new reagent, (ii) 
passive capillary action can load the sample directly 
5 from a standard microwell plate while retaining 

sufficient sample in the open capillary reservoir for 
the printing of numerous arrays, (iii) open capillaries 
are less prone to clogging than closed capillaries, and 
(iv) open capillaries do not require a perfectly faced 
10 bottom surface for fluid delivery. 

A portion of a microarray 36 formed on the surface 
38 of a solid support 40 in accordance with the method 
3ust described is shown in Fig. 3. The array is formed 
of a plurality of analyte-specif ic reagent regions, 
15 such as regions 42, where each region may include a 
different analyte-specif ic reagent. As indicated 
above, the diameter of each region is preferably 
between about 20-200 „m. The spacing between each 
region and its closest (non-diagonal) neighbor, 
10 measured from center-to-center (indicated at 44), is 
preferably in the range of about 20-400 /im- Thus, for 
example, an array having a center-to-center spacing of 
about 250 M m contains about 40 regions/cm or 1,600 
regions/cm 1 . After formation of the array, the support 
!5 is treated to evaporate the liquid of the droplet 

forming each region, to leave a desired array of dried, 
relatively flat regions. This drying may be done by 
heating or under vacuum. 

In some cases, it is desired to first rehydrate 
0 the droplets containing the analyte reagents to allow 
for more time for adsorption to the solid support, it 
is also possible to spot out the analyte reagents in a 
humid environment so that droplets do not dry until the 
arraying operation is complete. 
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III. Automated Apparatus for Forming Arrays 

In another aspect, the invention includes an 
automated apparatus for forming an array of analyte- 
assay regions on a solid support, where each region in 
5 the array has a known amount of a selected, analyte- 
specific reagent. 

The apparatus is shown in planar, and partially 
schematic view in Fig. 4. A dispenser device 72 in the 
apparatus has the basic construction described above 

10 with respect to Fig. 1, and includes a dispenser 74 

having an open-capillary channel terminating at a tip, 
substantially as shown in Figs. 1 and 2A-2C. 

The dispenser is mounted in the device for 
movement toward and away from a dispensing position at 

15 which the tip of the dispenser taps a support surface, 
to dispense a selected volume of reagent solution, as 
described above. This movement is effected by a 
solenoid 76 as described above. Solenoid 76 is under 
the control of a control unit 77 whose operation will 

20 be described below. The solenoid is also referred to 
herein as dispensing means for moving the device into 
tapping engagement with a support, when the device is 
positioned at a defined array position with respect to 
that support. 

25 The dispenser device is carried on an arm 74 which 

iis threadedly mounted on a worm screw 80 driven 
(rotated) in a desired direction by a stepper motor 82 
also under the control of unit 77. At its left end in 
the figure screw 80 is carried in a sleeve 84 for 

30 rotation about the screw axis. At its other end, the 
screw is mounted to the drive shaft of the stepper 
motor, which in turn is carried on a sleeve 86. The 
dispenser device, worm screw, the two sleeves mounting 
the w rm screw, and the stepper motor used in moving 

35 the device in the "x M (horizontal) direction in the 
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figure form what is referred to here collectively as a 
displacement assembly 86. 

The displacement assembly is constructed to 
produce precise , micro-range movement in the direction 
5 of the screw, i.e., along an x axis in the figure. In 
one mode, the assembly functions to move the dispenser 
in x-axis increments having a selected distance in the 
range 5-25 pm. In another mode, the dispenser unit may 
be moved in precise x-axis increments of several 

10 microns or more,: for positioning the dispenser at 

associated positions on adjacent supports, as will be 
described below. 

The displacement assembly, in turn, is mounted for 
movement in the »y" (vertical) axis of the figure, for 

15 positioning the dispenser at a selected y axis 

position. The structure mounting the assembly includes 
a fixed rod 88 mounted rigidly between a pair of frame 
bars 90, 92, and a worm screw 94 mounted for rotation 
between a pair of frame bars 96, 98. The worm screw is 

20 driven (rotated) by a stepper motor 100 which operates 
under the control of unit 77. The motor is mounted on 
bar 96, as shown. 

The structure just described, including worm screw 
94 and motor 100, is constructed to produce precise, 

25 micro-range movement in the direction of the screw, 
i.e., along an y axis in the figure. As above, the 
structure functions in one mode to move the dispenser 
in y-axis increments having a selected distance in the 
range 5-250 /xm, and in a second mode, to move the 

30 dispenser in precise y-axis increments of several 

microns (/ra) or more, for positioning the dispenser at 
associated positions on adjacent supports. 

The displacement assembly and structure for moving 
this assembly in the y axis are referred to herein 

35 collectively as positioning means for positioning the 
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dispensing device at a selected array position with 
respect to a support. 

A holder 102 in the apparatus functions to hold a 
plurality of supports, such as supports 104 on which 
5 the microarrays of regent regions are to be formed by 
the apparatus. The holder provides a number of 
recessed slots, such as slot 106, which receive the 
supports, and position them at precise selected 
positions with respect to the frame bars on which the 

10 dispenser moving means is mounted. 

As noted above, the control unit in the device 
functions to actuate the two stepper motors and 
dispenser solenoid in a sequence designed for automated 
operation of the apparatus in forming a selected 

15 microarray of reagent regions on each of a plurality of 
supports. 

The control unit is constructed, according to 
conventional microprocessor control principles, to 
provide appropriate signals to each of the solenoid and 

20 each of the stepper motors, in a given timed sequence 
and for appropriate signalling time. The construction 
of the unit, and the settings that are selected by the 
user to achieve a desired array pattern, will be 
understood from the following description of a typical 

25 apparatus operation. 

Initially, one or more supports are placed in one 
or more slots in the holder. The dispenser is then 
moved to a position directly above a well (not shown) 
containing a solution of the first reagent to be 

30 dispensed on the support (s) . The dispenser solenoid is 
actuated now to lower the dispenser tip into this well, 
causing the capillary channel in the dispenser to fill. 
Motors 82, 100 are now actuated to position the 
dispenser at a selected array position at the first of 

35 the supports. Solenoid actuation of the dispenser is 
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then effective to dispense a selected-volume droplet of 
that reagent at this location. As noted above, this 
operation is effective to dispense a selected volume 
preferably between 2 pi and 2 nl of the reagent 
5 solution. 

The dispenser is now moved to the corresponding 
position at an adjacent support and a similar volume of 
the solution is dispensed at this position. The 
process is repeated until the reagent has been 
10 dispensed at this preselected corresponding position on 
each of the supports. 

Where it is desired to dispense a single reagent 
at more than two array positions on a support, the 
dispenser may be moved to different array positions at 
15 each support, before moving the dispenser to a new 
support, or solution can be dispensed at individual 
positions on each support, at one selected position, 
then the cycle repeated for each new array position. 
To dispense the next reagent, the dispenser is 
20 positioned over a wash solution (not shown) , and the 
dispenser tip is dipped in and out of this solution 
until the reagent solution has been substantially 
washed from the tip. Solution can be removed from the 
tip, after each dipping, by vacuum, compressed air 
25 spray, sponge, or the like. 

The dispenser tip is now dipped in a second 
reagent well, and the filled tip is moved to a second 
selected array position in the first support. The 
process of dispensing reagent at each of the 
30 corresponding second-array positions is then carried as 
above. This process is repeated until an entire 
microarray of reagent solutions on each of the supports 
has been formed. 



35 IV. Microarrav Substrate 



WO 95/35505 



PCT/US95/07659 



23 — 

This section describes embodiments of a substrate 
having a microarray of biological polymers carried on 
the substrate surface. Subsection A describes a multi- 
cell substrate, each cell of which contains a 
5 microarray, and preferably an identical microarray, of 
distinct biopolymers, such as distinct polynucleotides, 
formed on a porous surface. Subsection B describes a 
microarray of distinct polynucleotides bound on a glass 
slide coated with a polycationic polymer. 

10 

A. Multi-Cell Substrate 

Fig. 9 illustrates, in plan view, a substrate 110 
constructed according to the invention. The substrate 
has an 8 x 12 rectangular array 112 of cells, such as 

15 cells 114, 116, formed on the substrate surface. With 
reference to Fig. 10, each cell, such as cell 114, in 
turn supports a microarray 118 of distinct biopolymers, 
such as polypeptides or polynucleotides at known, 
addressable regions of the microarray. Two such 

20 regions forming the microarray are indicated at 120, 

and correspond to regions, such as regions 42, forming 
the microarray of distinct biopolymers shown in Fig. 3. 

The 96-cell array shown in Fig. 9 has typically 
array dimensions between about 12 and 244 mm in width 

25 and 8 and 400 mm in length, with the cells in the array 
having width and length dimension of 1/12 and 1/8 the 
array width and length dimensions, respectively, i.e., 
between about 1 and 20 in width and 1 and 50 mm in 
length. 

30 The construction of substrate is shown cross- 

sectionally in Fig. 11, which is an enlarged sectional 
view taken along view line 124 in Fig. 9. The 
substrate includes a water- impermeable backing 126, 
such as a glass slide or rigid polymer sheet. Formed 

35 on the surface of the backing is a water-permeable film 
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128. The film is formed of a porous membrane material , 
such as nitrocellulose membrane, or a porous web 
material, such as a nylon, polypropylene, or PVDF 
porous polymer material. The thickness of the film is 
5 preferably between about 10 and 1000 /ra. The film may 
be applied to the backing by spraying or coating 
uncured material on the backing, or by applying a 
preformed membrane to the backing. The backing and 
film may be obtained as a preformed unit from 
10 commercial source, e.g., a plastic-backed 

nitrocellulose film available from Schleicher and 
Schuell Corporation. 

With continued reference to Fig. 11, the film- 
covered surface in the substrate is partitioned into a 
desired array of cells by water- impermeable grid lines, 
such as lines 130, 132, which have infiltrated the film 
down to the level of the backing, and extend above the 
surface of the film as shown, typically a distance of 
100 to 2000 nm above the film surface. 

The grid lines are formed on the substrate by 
laying down an uncured or otherwise f lowable resin or 
elastomer solution in an array grid, allowing the 
material to infiltrate the porous film down to the 
backing, then curing or otherwise hardening the grid 
lines to form the cell-array substrate. 

One preferred material for the grid is a f lowable 
silicone available from Loctite Corporation. The 
barrier material can be extruded through a narrow 
syringe (e.gN, 22 gauge) using air pressure or 
mechanical pressure. The. syringe is moved relative to 
the solid support to print the barrier elements as a 
grid pattern. The extruded bead of silicone wicks into 
the pores of the solid support and cures to form a 
shallow waterproof barrier separating the regi ns of 
the solid support. 
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In alternative embodiments, the barrier element 
can be a wax-based material or a thermoset material 
such as epoxy. The barrier material can also be a UV- 
curing polymer which is exposed to UV light after being 
5 printed onto the solid support. The barrier material 
may also be applied to the solid support using printing 
techniques such as silk-screen printing. The barrier 
material may also be a heat-seal stamping of the porous 
solid support which seals its pores and forms a water- 
10 impervious barrier element. The barrier material may 
also be a shallow grid which is laminated or otherwise 
adhered to the solid support. 

In addition to plastic-backed nitrocellulose, the 
solid support can be virtually any porous membrane with 
15 or without a non-porous backing. Such membranes are 
readily available from numerous vendors and are made 
from nylon, PVDF, polysulfone and the like. In an 
alternative embodiment, the barrier element may also be 
used to adhere the porous membrane to a non-porous 
20 backing in addition to functioning as a barrier to 
prevent cross contamination of the assay reagents. 

In an alternative embodiment, the solid support 
can be of a non-porous material. The barrier can be 
printed either before or after the microarray of 
25 biomolecules is printed on the solid support. 

As can be appreciated, the cells formed by the 
grid lines and the underlying backing are water- 
impermeable, having side barriers projecting above the 
porous film in the cells. Thus, defined- volume samples 
30 can be placed in each well without risk of cross- 
contamination with sample material in adjacent cells. 
In Fig. 11, defined volumes samples, such as sample 
134 , are shown in the cells. 

As noted above, each well contains a microarray of 
35 distinct biopolymers. In on general embodiment, the 
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microarrays in the well are identical arrays of 
distinct biopolymers, e.g., different sequence 
polynucleotides. Such arrays can be formed in 
accordance with the methods described in Section II, by 
5 depositing a first selected polynucleotide at the same 
selected microarray position in each of the cells, then 
depositing a second polynucleotide at a different 
microarray position in each well, and so on until a 
complete, identical microarray is formed in each cell. 

10 in a preferred embodiment, each microarray 

contains about 10 3 distinct polynucleotide or 
polypeptide biopolymers per surface area of less than 
about 1 cm 2 . Also in a preferred embodiment, the 
biopolymers in each microarray region are present in a 

15 defined amount between about 0.1 femtomoles and 100 

nanomoles. The ability to form high-density arrays of 
biopolymers, where each region is formed of a well- 
defined amount of deposited material, can be achieved 
in accordance with the microarray-f orming method 

20 described in Section II. 

Also in a preferred embodiments, the biopolymers 
are polynucleotides having lengths of at least about 50 
bp, i.e., substantially longer than oligonucleotides 
which can be formed in high-density arrays by schemes 

25 involving parallel, step-wise polymer synthesis on the 
array surface. 

In the case of a polynucleotide array, in an assay 
procedure, a small volume of the labeled DNA probe 
mixture in a standard hybridization solution is loaded 

30 onto each cell. The solution will spread to cover the 
entire microarray and stop at the barrier elements. 
The solid support is then incubated in a humid chamber 
at the appropriate temperature as required by the 
assay. 
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Each assay may be conducted in an "open- face" 
format where no further sealing step is required, since 
the hybridization solution will be kept properly 
hydra ted by the water vapor in the humid chamber. At 
5 the conclusion of the incubation step, the entire solid 
support containing the numerous microarrays is rinsed 
quickly enough to dilute the assay reagents so that no 
significant cross contamination occurs. The entire 
solid support is then reacted with detection reagents 

10 if needed and analyzed using standard color imetric, 
radioactive or fluorescent detection means. All 
processing and detection steps are performed 
simultaneously to all of the microarrays on the solid 
support ensuring uniform assay conditions for all of 

15 the microarrays on the solid support. 

. B. Glass-Slid e Polvnupl ftotide Array 

Fig. 5 shows a substrate 136 formed according to 
another aspect of the invention, and intended for use 

20 in detecting binding of labeled polynucleotides to one 
or more of a plurality distinct polynucleotides. The 
substrate includes a glass substrate 138 having formed 
on its surface, a coating of a polycationic polymer, 
preferably a cationic polypeptide, such as poly lysine 

25 or polyarginine. Formed on the polycationic coating is 
a microarray 140 of distinct polynucleotides, each 
localized at known selected array regions, such as 
regions 142. 

The slide is coated by placing a uniform-thickness 
30 film of a polycationic polymer, e.g., poly-l-lysine, on 
the surface of a slide and drying the film to form a 
dried coating. The amount of polycationic polymer 
added is sufficient to form at least a monolayer of 
polymers on th glass surface. The polymer film is 
35 bound to surface via electrostatic binding between 



WO 95/35505 



PCT/US95/07659 



28 — 

negative silyl-OH groups on the surface and charged 
amine groups in the polymers. Poly-l-lysine coated 
glass slides may be obtained commercially, e.g., from 
Sigma Chemical Co. (St. Louis, MO). 
5 To form the microarray, defined volumes of 

distinct polynucleotides are deposited on the polymer- 
coated slide, as described in Section II. According to 
an important feature of the substrate, the deposited 
polynucleotides remain bound to the coated slide 

10 surface non-covalently when an aqueous DNA sample is 
applied to the substrate under conditions which allow 
hybridization of reporter-labeled polynucleotides in 
the sample to complementary-sequence (single-stranded) 
polynucleotides in the substrate array. The method is 

15 illustrated in Examples 1 and 2. 

To illustrate this feature, a substrate of the 
type just described, but having an array of same- 
sequence polynucleotides, was mixed with fluorescent- 
labeled complementary DNA under hybridization 

20 conditions. After washing to remove non-hybridized 
material, the substrate was examined by low-power 
fluorescence microscopy. The array can be visualized 
by the relatively uniform labeling pattern of the array 
regions. 

25 In a preferred embodiment, each microarray 

contains at least 10 3 distinct polynucleotide or 
polypeptide biopolymers per surface area of less than 
about 1 cm 2 . In the embodiment shown in Fig. 5, the 
microarray contains 400 regions in an area of about 16 

30 mm 2 , or 2.5 x lo 3 regions/cm 2 . Also in a preferred 

embodiment, the polynucleotides in the each microarray 
region are present in a defined amount between about 
0.1 femtomoles and 100 nanomoles in the case of 
polynucleotides. As above, the ability to form high- 
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density arrays of this type /where each region is 
formed of a well-defined amount of deposited material, 
can be achieved in accordance with the microarray- 
forming method described in Section II. 
5 Also in a preferred embodiments, the 

polynucleotides have lengths of at least about 50 bp, 
i.e., substantially longer than oligonucleotides which 
can be formed in high-density arrays by various in situ 
synthesis schemes. 

10 

V. Utility 

Microarrays of immobilized nucleic acid sequences 
prepared in accordance with the invention can be used 
for large scale hybridization assays in numerous 

15 genetic applications, including genetic and physical 

mapping of genomes, monitoring of gene expression, DNA 
sequencing, genetic diagnosis, genotyping of organisms, 
and distribution of DNA reagents to researchers. 

For gene mapping, a gene or a cloned DNA fragment 

20 is hybridized to an ordered array of DNA fragments, and 
the identity of the DNA elements applied to the array 
is unambiguously established by the pixel or pattern of 
pixels of the array that are detected. One application 
of such arrays for creating a genetic map is described 

25 by Nelson, et al. (1993). In constructing physical 
maps of the genome, arrays of immobilized cloned DNA 
fragments are hybridized with other cloned DNA 
fragments to establish whether the cloned fragments in 
the probe mixture overlap and are therefore contiguous 

30 to the immobilized clones on the array. For example, 
Lehrach, et al., describe such a process. 

The arrays of immobilized DNA fragments may also 
be used for genetic diagnostics. To illustrate, an 
array c ntaining multiple forms of a mutated gene or 

35 g nes can be pr bed with a labeled mixture of a 
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patient's DNA which will preferentially interact with 
nly one of the immobilized versions of the gene. 

The detection of this interaction can lead to a 
medical diagnosis. Arrays of immobilized DNA fragments 
5 can also be used in DNA probe diagnostics. For 

example, the identity of a pathogenic microorganism can 
be established unambiguously by hybridizing a sample of 
the unknown pathogen's DNA to an array containing many 
types of known pathogenic DNA. A similar technique can 

10 also be used for ^unambiguous genotyping of any 

organism, other molecules of genetic interest, such as 
cDNA's and RNA's can be immobilized on the array or 
alternately used as the labeled probe mixture that is 
applied to the array. 

15 Ir > one application, an array of cDNA clones 

representing genes is hybridized with total cDNA from 
an organism to monitor gene expression for research or 
diagnostic purposes. Labeling total cDNA from a normal 
cell with one color f luorophore and total cDNA from a 

20 diseased cell with another color f luorophore and 

simultaneously hybridizing the two cDNA samples to the 
same array of cDNA clones allows for differential gene 
expression to be measured as the ratio of the two 
f luorophore intensities. This two-color experiment can 

25 be used to monitor gene expression in different tissue 
types, disease states, response to drugs, or response 
to environmental factors. & An example of this approach 
is illustrated in Examples 2, described with respect to 
Fig. 8. 

30 By way of example and without implying a 

limitation of scope, such a procedure could be used to 
simultaneously screen many patients against all known 
mutations in a disease gene. This invention could be 
used in the form of, for example, 96 identical 0.9 cm x 

35 2.2 cm microarrays fabricated on a single 12 cm x 18 cm 
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sheet of plastic-backed nitrocellulose where each 
microarray could contain, for example, 100 DNA 
fragments representing all known mutations of a given 
gene. The region of interest from each of the DNA 
5 samples from 96 patients could be amplified, labeled, 
and hybridized to the 96 individual arrays with each 
assay performed in 100 microliters of hybridization 
solution. The approximately 1 thick silicone rubber 
barrier elements between individual arrays prevent 

10 cross contamination of the patient samples by sealing 
the pores of the nitrocellulose and by acting as a 
physical barrier between each microarray. The solid 
support containing all 96 microarrays assayed with the 
96 patient samples is incubated, rinsed, detected and 

15 analyzed as a single sheet of material using standard 
radioactive, fluorescent, or color imetric detection 
means (Maniatas, et a!., 1989). Previously, such a 
procedure would involve the handling, processing and 
tracking of 96 separate membranes in 96 separate sealed 

20 chambers. By processing all 96 arrays as a single 

sheet of material, significant time and cost savings 
are possible. 

The assay format can be reversed where the patient 
or organism's DNA is immobilized as the array elements 

25 and each array is hybridized with a different mutated 
allele or genetic marker. The gridded solid support 
can also be used for parallel non-DNA ELISA assays. 
Furthermore, the invention allows for the use of all 
standard detection methods without the need to remove 

30 the shallow barrier elements to carry out the detection 
step. 

In addition to the genetic applications listed 
above, arrays of whole cells, peptides, enzymes, 
antibodies, antigens, r ceptors, ligands, 
35 ph spholipids, polymers, drug cogener preparations or 
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chemical substances can be fabricated by the means 
described in this invention for large scale screening 
assays in medical diagnostics, drug discovery, 
molecular biology, immunology and toxicology. 

The multi-cell substrate aspect of the invention 
allows for the rapid and convenient screening of many 
DNA probes against many ordered arrays of DNA 
fragments. This eliminates the need to handle and 
detect many individual arrays for performing mass 
screenings for genetic research and diagnostic 
applications. Numerous microarrays can be fabricated 
on the same solid support and each microarray reacted 
with a different DNA probe while the solid support is 
processed as a single sheet of material. 

The following examples illustrate, but in no way 
are intended to limit, the present invention. 



20 



25 



Example i 

genpmjc-Cpmplexitv Hybrid j r , n tQ Mi.r. 

Saccharomvces cor-o yisiae ftgnome wi-hh 
TWO-Color Fhinrpcrent Detart-^n^ 

The array elements were randomly amplified PCR 
(Bohlander, et al., 1992 ) products using physically 
mapped lambda clones of S. cersvisiae genomic DNA 
templates (Riles, et al., 1993). The PCR was performed 
directly on the lambda phage lysates resulting in an 
amplification of both the 35 kb lambda vector and the 
5-15 kb yeast insert sequences in the form of a uniform 
distribution of PCR product between 250-1500 base pairs 
in length. The PCR product was purified using 
Sephadex G50 gel filtration (Pharmacia, Piscataway, N J) 
and concentrated by evaporation to dryness at room 
35 temperature overnight. Each of the 864 amplified 



30 
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lambda clones was rehydrated in 15 pi of 3 x SSC in 
preparation for spotting onto the glass. 

The micro arrays were fabricated on microscope 
slides which were coated with a layer of poly-l-lysine 
5 (Sigma) . The automated apparatus described in Section 
IV loaded 1 fil of the concentrated lambda clone PCR 
product in 3 x SSC directly from 96 well storage plates 
into the open capillary printing element and deposited 
-5 nl of sample per slide at 380 micron spacing between 

10 spots, on each of 40 slides. The process was repeated 
for all 864 samples and 8 control spots. After the 
spotting operation was complete, the slides were 
rehydrated in a humid chamber for 2 hours, baked in a 
dry 80° vacuum oven for 2 hours, rinsed to remove un- 

15 absorbed DNA and then treated with succinic anhydride 
to reduce non-specific adsorption of the labeled 
hybridization probe to the poly-l-lysine coated glass 
surface. Immediately prior to use, the immobilized DNA 
on the array was denatured in distilled water at 90° 

20 for 2 minutes. 

For the pooled chromosome experiment, the 16 
chromosomes of Saccharomyces cerevisiae were separated 
in a CHEF agarose gel apparatus (Biorad, Richmond, CA) . 
The six largest chromosomes were isolated in one gel 

25 slice and the smallest 10 chromosomes in a second gel 
slice. The DNA was recovered using a gel extraction 
kit (Qiagen, Chatsworth, CA) . The two chromosome pools 
were randomly amplified in a manner similar to that 
used for the target lambda clones. Following 

30 amplification, 5 micrograms of each of the amplified 

chromosome pools were separately random-primer labeled 
using Klenow polymerase (Amersham, Arlington Heights, 
IL) with a lissamine conjugated nucleotide analog 
(Dupont NEN, Bost n, MA) for the pool containing the 

35 six largest chrom somes, and with a fluorescein 
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conjugat d nucleotide analog (BMB) for the pool 
containing smallest ten chromosomes. The two po Is 
were mixed and concentrated using an ultrafiltration 
device (Amicon, Danvers, MA). 
5 Five micrograms of the hybridization probe 

consisting of both chromosome pools in 7.5 /il of TE was 
denatured in a boiling water bath and then snap cooled 
on ice. 2.5 /il of concentrated hybridization solution 
(5 x SSC and 0.1% SDS) was added and all 10 pi 

10 transferred to the array surface, covered with a cover 
slip, placed in a custom-built single-slide humidity 
chamber and incubated at 60° for 12 hours. The slides 
were then rinsed at room temperature in 0.1 x SSC and 
0.1%SDS for 5 minutes, cover slipped and scanned. 

IS A custom built laser fluorescent scanner was used 

to detect the two-color hybridization signals from the 
1.8 x 1.8 cm array at 20 micron resolution. The 
scanned image was gridded and analyzed using custom 
image analysis software. After correcting for optical 

20 crosstalk between the fluorophores due to their 
overlapping emission spectra, the red and green 
hybridization values for each clone on the array were 
correlated to the known physical map position of the 
clone resulting in a computer-generated color karyotype 

25 of the yeast genome. 

Figure 6 shows the hybridization pattern of the 
two chromosome pools. A red signal indicates that the 
lambda clone on the array surface contains a cloned 
genomic DNA segment from one of the largest six yeast 

30 chromosomes. A green signal indicates that the lambda 
clone insert comes from one of the smallest ten yeast 
chromosomes. Orange signals indicate repetitive 
sequences which cross hybridized to both chromosome 
pools. Control spots on the array confirm that the 

35 hybridization is specific and reproducible. 
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The physical nap locations of the genomic DNA 
fragments contained in each of the clones used as array 
elements have been previously determined by Olson and 
co-workers (Riles, et ml.) allowing for the automatic 
generation of the color karyotype shown in Figure 7. 
The color of a chromosomal section on the karyotype 
corresponds to the color of the array element 
containing the clone from that section. The black 
regions of the karyotype represent false negative dark 
spots on the array (10%) or regions of the genome not 
covered by the Olson clone library (90%) . Note that 
the largest six chromosomes are mainly red while the 
smallest ten chromosomes are mainly green matching the 
original CHEF gel isolation of the hybridization probe. 
Areas of the red chromosomes containing green spots and 
vice-versa are probably due to spurious sample tracking 
errors in the formation of the original library and in 
the amplification and spotting procedures. 

The yeast genome arrays have also been probed with 
individual clones or pools of clones that are 
fluorescently labeled for physical mapping purposes. 
The hybridization signals of these clones to the array 
were translated into a position on the physical map of 
yeast. 



Example 2 

Tptal cDNA Hybridised to Micro Arrays of 
cDNA Cl ones with Two-Color 
Fluorescent Detection 

24 clones containing cDNA inserts from the plant 

Arabidopsis were amplified using PCR. Salt was added 

to the purified PCR products to a final concentration 

of 3 x SSC. The cDNA clones were spotted on poly-l- 

lysine coated microscope slides in a manner similar to 

Example 1. Among the cDNA clones was a clone 
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representing a transcription factor HAT 4, which had 
previously been used to create a transgenic line of the 
plant Arabidopsis, in which this gene is present at ten 
times the level found in wild-type Arabidopsis (Schena, 
5 et al. r 1992). 

Total poly-A mRNA from wild type Arabidopsis was 
isolated using standard methods (Maniatis, et al., 
1989) and reverse transcribed into total cDNA, using 
fluorescein nucleotide analog to label the cDNA product 
(green fluorescence) . A similar procedure was 
performed with the transgenic line of Arabidopsis where 
the transcription factor HAT4 was inserted into the 
genome using standard gene transfer protocols. cDNA 
copies of mRNA from the transgenic plant are labeled 
with a lissamine nucleotide analog (red fluorescence) . 
Two micrograms of the cDNA products from each type of 
plant were pooled together and hybridized to the cDNA 
clone array in a 10 microliter hybridization reaction 
in a manner similar to Example i. Rinsing and 
detection of hybridization was also performed in a 
manner similar to Example 1. Pig. 8 show the resulting 
hybridization pattern of the array. 

Genes equally expressed in wild type and the 
transgenic Arabidopsis appeared yellow due to equal 
contributions of the green and red fluorescence to the 
final signal. The dots are different intensities of 
yellow indicating various levels of gene expression. 
The cDNA clone representing the transcription factor 
HAT4, expressed in the transgenic line of Arabidopsis 
but not detectably expressed in wild type Arabidopsis , 
appears as a red dot (with the arrow pointing to it) , 
indicating the preferential expression of the 
transcription factor in the red-labeled transgenic 
Arabidopsis and the relative lack of expression of the 



WO 95/35505 



PCT/US95/07659 



37 _ 

transcription factor in the gre n-lab led wild type 
Arabidopsis . 

An advantage of the microarray hybridization 
format for gene expression studies is the high partial 
5 concentration of each cDNA species achievable in the 10 
microliter hybridization reaction. This high partial 
concentration allows for detection of rare transcripts 
without the need for PCR amplification of the 
hybridization probe which may bias the true genetic 

10 representation of each discrete cDNA species. 

Gene expression studies such as these can be used 
for genomics research to discover which genes are 
expressed in which cell types, disease states, 
development states or environmental conditions. Gene 

15 expression studies can also be used for diagnosis of 
disease by empirically correlating gene expression 
patterns to disease states. 



Example 3 

20 Multiplexed Colori metric Hybridization on 

a Gridded Solid Support 

A sheet of plastic-backed nitrocellulose was 

gridded with barrier elements made from silicone rubber 

according to the description in Section IV-A. The 

25 sheet was soaked in 10 x SSC and allowed to dry. As 

shown in Fig. 12, 192 M13 clones each with a different 
yeast inserts were arrayed 400 microns apart in four 
quadrants of the solid support using the automated 
device described in Section ill. The bottom left 

30 quadrant served as a negative control for hybridization 
while each of the other three quadrants was hybridized 
simultaneously with a different oligonucleotide using 
the open-face hybridization technology described in 
Section IV-A. The first two and last four elements of 
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each array are positive controls f r the coiorimetric 
detection step. 

The oligonucleotides were labeled with fluorescein 
which was detected using an anti-f luorescein antibody 
conjugated to alkaline phosphatase that precipitated an 
NBT/BCIP dye on the solid support (Amersham) . Perfect 
matches between the labeled oligos and the M13 clones 
resulted in dark spots visible to the naked eye and 
detected using an optical scanner (HP ScanJet II) 
attached to a personal computer. The hybridization 
patterns are different in every quadrant indicating 
that each oligo found several unique M13 clones from 
among the 192 with a perfect sequence match. Note that 
the open capillary printing tip leaves detectable 
dimples on the nitrocellulose which can be used to 
automatically align and analyze the images. 

Although the invention has been described with 
respect to specific embodiments and methods, it will be 
clear that various changes and modification may be made 
without departing from the invention. 
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IT IS CLAIMED: 



1. 



A method of forming a microarray of analyte- 
assay regions on a solid support, where each region in 
the array has a known amount of a selected, analyte- 
specific reagent, said method comprising, • 

(a) loading a solution of a selected analyte- 
specif ic reagent in a reagent-dispensing device having 
an elongate capillary channel (i) formed by spaced- 
apart, coextensive elongate members, (ii) adapted to 
hold a quantity of the reagent solution and (iii) 
having a tip region at which aqueous solution in the 
channel forms a meniscus, 

(b) tapping the tip of the dispensing device 
against a solid support at a defined position on the 
surface, with an impulse effective to break the 
meniscus in the capillary channel and deposit: a 
selected volume of solution on the surface, and 

(c) repeating steps (a) and (b) until said array 
20 is formed. 

2. The method of claim l, wherein said tapping is 
carried out with an impulse effective to deposit a 
selected volume in the volume range between 0.01 to 100 
25 nl. 



3. The method of claim 1, wherein said channel is 
formed by a pair of spaced-apart tapered elements. 



4. The method of claim 1, for forming a plurality 
of such arrays, wherein step (b) is applied to a 
selected position on each of a plurality of solid 
supports at each repeat cycle proceeding step (c) . 
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5. The method of claim 1, which further includes, 
after performing steps (a) and (b) at least one time, 
reloading the reagent-dispensing device with a new 
reagent solution by the steps of (i) dipping the 
5 capillary channel of the device in a wash solution, 
(ii) removing wash solution drawn into the capillary 
channel, and (iii) dipping the capillary channel into 
the new reagent solution. 

10 6. Automated apparatus for forming a microarray 

of analyte-assay regions on a plurality of solid 
supports, where each region in the array has a known 
amount of a selected, analyte-specif ic reagent, said 
apparatus comprising 

15 (a) a holder for holding, at known positions, a 

plurality of planar supports, 

(b) a reagent dispensing device having ah open 
capillary channel (i) formed by spaced-apart , 
coextensive elongate members (ii) adapted to hold a 

20 guantity of the reagent solution and (iii) having a tip 
region at which aqueous solution in the channel forms a 
meniscus , 

(c) positioning means for positioning the 
dispensing device at a selected array position with 

25 respect to a support in said holder, 

(d) dispensing means for moving the device into 
tapping engagement against a support with a selected 
impulse, when the device is positioned at a defined 
array position with respect to that support, with an 

30 impulse effective to break the meniscus of liquid in 

the capillary channel and deposit a selected volume of 
solution on the surface, and 

(e) control means for controlling said positioning 
and dispensing means. 
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7. The apparatus of claim 6, wh rein said 
disp nsing means is effective to move said dispensing 
device against a support with an impulse effective to 
deposit a selected volume in the volume range between 

5 0.01 to 100 nl. 

8. The apparatus of claim 6, wherein said channel 
is formed by a pair of spaced-apart tapered elements. 

10 9- The apparatus of claim 6, wherein the control 

means operates to (i) place the dispensing device at a 
loading station, (ii) move the capillary channel in the 
device into a selected reagent at the loading station, 
to load the dispensing device with the reagent, and 

15 (iii) dispense the reagent at a defined array position 
on each of the supports on said holder. 

10 . The apparatus of claim 6 , wherein the control 
device further operates, at the end of a dispensing 

20 cycle, to wash the dispensing device by (i) placing the 
dispensing device at a washing station, (ii) moving the 
capillary channel in the device into a wash fluid, to 
load the dispensing device with the fluid, and (iii) 
remove the wash fluid prior to loading the dispensing 

25 device with a fresh selected reagent. 

11. The apparatus of claim 6, wherein said device 
is one of a plurality of such devices which are carried 
on the arm for dispensing different analyte assay 

30 reagents at selected spaced array positions. 

12. A substrate with a surface having a 
microarray of at least 10 3 distinct polynucleotide or 
polypeptide biopolymers per 1 cm 2 surface area, each 
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distinct biopolymer sample (i) being disposed at a 
separate, defined position in said array, (ii) having a 
length of at least 50 subunits, and (iii) being present 
in a defined amount between about 0.1 femtomole and 100 
5 nanomoles. 

13. The substrate of claim 12, wherein said 
surface is glass slide coated with poly lysine, and said 
biopolymers are polynucleotides. 

14. The substrate of claim 12, wherein said 
substrate has a water-impermeable backing, a water- 
permeable film formed on the backing, and a grid formed 
on the film, where said grid (i) is composed of 
intersecting water-impervious grid elements extending 
from said backing to positions raised above the surface 
of said film, and (ii) partitions the film into a 
plurality of water-impervious cells, where each cell 
contains such a biopolymer array. 

15. A substrate with a surface array of sample- 
receiving cells, comprising 

a water-impermeable backing, 

a water-permeable film formed on the backing, and 
a grid formed on the film, said grid being composed of 
intersecting water- impervious grid elements extending 
from said backing to positions raised above the surface 
of said film. 

30 16. The substrate of claim 15, wherein the cells 

of the array each contain an array of biopolymers. 

17. A substrate for use in detecting binding of 
labeled biopolymers to one or more of a plurality 
35 distinct polynucleotides, comprising 
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a non-por us, glass substrate, 

a coating of a cationic polymer on said substrate, 

and 

an array of distinct polynucleotides to said 
5 coating, where each biopolymer is disposed at a 
separate, defined position in a surface array of 
biopolymers. 

18. A method of detecting differential expression 

10 of each of a plurality of genes in a first cell type 
with respect to expression of the same genes in a 
second cell types, said method comprising 

producing fluorescence-labeled cDNA's from mRNA's 
isolated from the two cells types, where the cDNA's 

15 from the first and second cells are labeled with first 
and second different fluorescent reporters, 

adding a mixture of the labeled cDNA's from the 
two cell types to an array of polynucleotides 
representing a plurality of known genes derived from 

20 the two cell types, under conditions that result in 

hybridization of the cDNA's to complementary-sequence 
polynucleotides in the array; and 

examining the array by fluorescence under 
fluorescence excitation conditions in which (i) 

25 polynucleotides in the array that are hybridized 

predominantly to cDNA's derived from one of the first 
and second cell types give a distinct first or second 
fluorescence emission color, respectively, and (ii) 
polynucleotides in the array that are hybridized to 

30 substantially equal numbers of cDNA's derived from the 
first and second cell types give a distinct combined 
fluorescence emission color, respectively, 

wherein the relative expression of known genes in 
the two cell types can b determined by the observed 

35 fluorescence emission color of each spot. 
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19. The method of claim 18, wherein the array of 
polynucleotides is f nned n a substrate with a surface 
having an array of at least 10* distinct polynucleotide 
or polypeptide biopolymers in a surface area of less 
than about l cm 1 , each distinct biopolymer (i) being 
disposed at a separate, defined position in said array, 
(ii) having a length of at least 50 subunits, and (iii) 
being present in a defined amount between about .1 
femtomole and 100 nmoles. 

20. The method of claim 19, wherein said surface 
is a glass slide coated with poly lysine, and said 
biopolymers are polynucleotides non-covalently bound to 
said polylysine. 
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[57] ABSTRACT 

Methods and compositions for modeling the transcriptional 
responsiveness of an organism to a candidate drug involve 
(a) detecting reporter gene product signals from each of a 
plurality of different, separately isolated cells of a target 
organism, wherein each cell contains a recombinant con- 
struct comprising a reporter gene operatively linked to a 
different endogenous transcriptional regulatory element of 
the target organism such that the transcriptional regulatory 
element regulates the expression of the reporter gene, and 
the sum of the cells comprises an ensemble of the transcrip- 
tional regulatory elements of the organism sufficient to 
model the transcriptional responsiveness of said organism to 
a drug; (b) contacting each cell with a candidate drug; (c) 
detecting reporter gene product signals from each cell; (d) 
comparing reporter gene product signals from each cell 
before and after contacting the cell with the candidate drug 
to obtain a drug response profile which provides a model of 
the transcriptional responsiveness of said organism to the 
candidate drug. 

8 Claims, No Drawings 
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METHODS FOR DRUG SCREENING 
BACKGROUND 



The field of the invention is pharmaceutical drug screen- 
ing. Pharmaceutical research and development is a multi- 5 
billion dollar industry. Much of these resources are con- 
surned in efforts to focus the specificity of lead compounds 
In addition, many programs are aborted after decades of 
costiy yet fruitless efforts to limit side effects or toxicity of 
candidate drugs. Accordingly, tools that can abbreviate the 
research and discovery phase of drug development are 10 
desirable. Several in vitro or cell culture-based methods 
nave been described for identifying compounds with a 
particular biological effect through the activation of a linked 
reporter. Gadski et al. (1992) EP 92304902.7 describes 
methods for identifying substances which regulate the syn- « 
thesis of an apolipoprotein; Evans et al. (1991) U.S Pat No 
4,981,784 describes methods for identifying ligand for a 
receptor and Farr et al. (1994) WO 94/17208 describes 
methods and kits utilizing stress promoters to determine 
toxicity of a compound 20 

In general, the principle that has been applied in the 
existing pharmaceutical industry for the discovery and 
development of new lead compounds for drugs has been the 
establishment of sensitive and reliable in vitro assays for 
purified enzymes, and then screening large numbers of 25 
compounds and culture supematants for any ability to inhibit 
enzyme activity. The present invention exploits the recent 
advances in genome science to provide for the rapid screen- 
ing of large numbers of compounds against a systemic target 
comprising substantially all targets in a pathway, organism 30 
etc. for rare compounds having the ability to 'inhibit the 
protein of interest. The invention described herein in effect 
turns the drug discovery process inside out This invention 
provides information on the mechanism of action of every 
compound that affects cells, regardless of the target In 35 
addition, the relative specificity of all lead compounds is 
immediately established. 



2 — 

by modeling the transcriptional responses of the target 
organism with an ensemble of reporters, the expressions of 
which are regulated by transcription regulatory genetic 
elements derived from the genome of the target organism. 
The ensemble of reporting cells comprises as comprehensive 
a collection of transcription regulatory genetic elements as is 
conveniently available for the targeted organism so as to 
most accurately model the systemic transcriptional response 
Suitable ensembles generally comprise thousands of indi- 
vidually reporting elements; preferred ensembles are sub- 
stanually comprehensive, i.e. provide a transcriptional 
response diversity comparable to that of the target organism 
Generally, a substantially comprehensive ensemble requires 
transcription regulatory genetic elements from at least a 
majority of the organism's genes, and preferably includes 
those of an or nearly all of the genes. We term such a 
substantially comprehensive ensemble a genome reporter 
matrix. 



SUMMARY OF THE INVENTION 

The invention provides methods and compositions for 40 
estimating the physiological specificity of a candidate dm* 
In general the subject methods involve (a) detecting reporter* 
gene product Signals from each of a plurality of different 
separately isolated cells of a target organism, wherein each 
of said cells contains a recombinant construct comprising a 45 
reporter gene operatively linked to a different endogenous 
transcriptional regulatory element (e.g. promoter) of said 
target organism such that said transcriptional regulatory 
element regulates the expression of said reporter gene 
wherein said plurality of cells comprises an ensemble of the 50 
transcriptional regulatory elements of said organism suffi- 
cient to model the transcriptional responsiveness of said 
organism to a drug; (b) contacting each said cell with a 
candidate drug; (c) detecting reporter gene product signals 
from each of said cells; (d) comparing said reporter eene 55 
product signals from each of said cells before and after 
contacting each of said cells with said candidate drug to 
obtain a drug response profile; wherein said drug response 
profile provides an estimate of the physiological specificity 
or biological interactions of said candidate drug. 60 

DETAILED DESCRIPTION OF THE 
INVENTION 

The Genome Reporter Matrix. 

The invention provides methods and compositions for ° 
estimating the physiological specificity of a candidate drug 



It is frequently convenient to use an ensemble or genome 
reporter matrix derived from a lower eukaryote or common 
animal model to obtain F eliminary information on drug 
specificity m higher eukaryotes, such as humans. Because 
yeast, such as Saccharomyces cerevisiae, is a bona fide 
eukaryote, there is substantial conservation of biochemical 
junction between yeast and human cells in most pathways, 
from tiie sterol biosynthetic pathway to the Ras oncogene 
Indeed, the absence of many effective antifungal compounds 
illustrates how difficult it has been to find therapeutic targets 
that would selectively till fungal but not human cells. One 
example of a shared response pathway is sterol biosynthesis 
to human cells, the drug Mevacor (lovastatin) inhibits 
HMG-CoA reductase, the key regulatory enzyme of the 
sterol biosynthetic pathway. As a result the level of a 
particular regulatory sterol decreases, and the cells respond 
by increased transcription of the gene encoding the LDL 
receptor. In yeast, Mevacor also inhibits HMG-CoA reduc- 
tase and lowers the level of a key regulatory sterol. Yeast 
cells respond in an analogous fashion to human cells 
However, yeast do not have a gene for the LDL receptor 
Instead, the same effect is measured by increased transcript 
lion of the ERG 10 gene, which encodes acetoacetyl CoA 
thiolase, an enzyme also involved in sterol synthesis Thus 
the regulatory response is conserved between yeast and 
humans, even though the identity of the responding gene is 
ainerenL 

Advantages of the Genome Reporter Matrix as a 
Vehicle for Pharmaceutical Development 

The advantages of the subject methods over prior art 
screening methods may be illustrated by examples. Consider 
the difference between an in vitro assay for HMG-CoA 
reductase inhibitors as presently practiced by the pharma- 
ceuucal industry, and an assay for inhibitors of sterol bio- 
synthesis as revealed by the ERG 1 0 reporter. In the case of 
the former, information is obtained only for those rare 
compounds that happen to inhibit this one enzyme In 
contrast, in the case of the ERG 10 reporter, any compound 
that inhibits nearly any of the approximately 35 steps in the 
sterol biosynthetic pathway will, by lowering the level of 
mtraceUular sterols, induce the synthesis of the reporter 
Thus, the reporter can detect a much broader range of targets 
than can the purified enzyme, in this case 35 times more than 
the in vitro assay. 

Drugs often have side effects that are in part due to the 
i?xL°^ target sP^tf^y- However, the in vitro assay of 
HMG-CoA reductase provides no information on the sped- 
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ficity of a compound. In contrast, a genome reporter matrix 
reveals the spectrum of other genes in the genome also 
affected by the compound In considering two different 
compounds both of which induce the ERG10 reporter, if one 
compound affects the expression of 5 other reporters and a 
second compound affects the expression of 50 other report- 
ers, the first compound is, a priori, more likely to have fewer 
side effects. Because the identity of the reporters is known 
or determinable, information on other affected reporters is 
informative as to the nature of the side effect. A panel of 
reporters can be used to test derivatives of the lead com- 
pound to determine which of the derivatives have greater 
specificity than the first compound. 

As another example, consider the case of a compound that 
does not affect the in vitro assay for HMG-CoA reductase 
nor induces the expression of the ERG10 reporter. In the 
traditional approach to drug discovery, a compound that 
does not inhibit the target being tested provides no useful 
information. However, a compound having any significant 
effect on a biological process generally has some conse- 
quence on gene expression. A genome reporter matrix can 20 
thus provide two different kinds of information for most 
compounds. In some cases, the identity of reporter genes 
affected by the inhibitor evidences to how the inhibitor 
functions. For example, a compound that induces a cAMP- 
dependent promoter in yeast may affect the activity of the 25 
Ras pathway. Even where the compound affects the expres- 
sion of a set of genes that do not evidence the action of the 
compound, the matrix provides a comprehensive assessment 
of the action of the compound that can be stored in a 
database for later analyses. A library of such matrix response 30 
profiles can be continuously investigated, much as the 
Spectral Compendiums of chemistry arc continually refer- 
enced in the chemical arts. For example, if the database 
reveals that compound X alters the expression of gene Y, and 
a paper is published reporting that the expression of gene Y 
is sensitive to, for example, the inositol phosphate signaling 
pathway, compound X is a candidate for modulating the 
inositol phosphate signaling pathway. In effect the genome 
reporter matrix is an informational translator that takes 
information on a gene directly to a compound that may 
already have been found to affect the expression of that gene. 
"Ms tool should dramatically shorten the research and 
discovery phase of drug development, and effectively lever- 
age the value of the publicly available research portfolio on 
all genes. 

In many cases, a drug of interest would work on protein 
targets whose impact on gene expression would not be 
known a priori. The genome reporter matrix can neverthe- 
less be used to estimate which genes would be induced or 
repressed by the drug. In one embodiment, a dominant 
mutant form of the gene encoding a drug-targeted protein is 
introduced into all the strains of the genome reporter matrix 
and the effect of the dominant mutant, which interferes with 
the gene product's normal function, evaluated for each 
reporter. This genetic assay informs us which genes would 
be affected by a drug that has a similar mechanism of action. 
In many cases, the drug itself could be used to obtain the 
same information. However, even if the drug itself were not 
available, genetics can be used to predetermine what its 
response profile would be in the genome reporter matrix. 
Furthermore, it is not necessary to know the identity of any 
of the responding genes. Instead, the genetic control with the 
dominant mutant sorts the genome into those genes that 
respond and those that do not. Hence, if drugs that disrupt a 
given cellular function were desired, dominant mutants for 65 
such function introduced into the genome reporter matrix 
reveal what response profile to expect for such an agent. 
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For example, taxol, a recent advance in potential breast 
cancer therapies, has been shown to interfere with tubulin- 
based cytoskeletal elements. Hence, a dominant mutant form 
of tubulin provides a response profile informative for breast 
cancer therapies with similar modes of action to taxol. 
Specifically, a dorninant mutant form of tubulin is intro- 
duced into all the strains of the genome reporter matrix and 
the effect of this dominant mutant, which interferes with the 
microtubule cytoskeleton, evaluated for each reporter. Thus, 
any new compound that induces the same response profile as 
the dominant tubulin mutant would provide a candidate for 
a taxol-like pharmaceutical. 

In addition, the genome reporter matrix can be used to 
genetically create or model various disease states. In this 
way, pathways present specifically in the disease state can be 
targeted. For example, the specific response profile of trans- 
forming mutant Ras2 va/19 identifies Ras2 vai19 induced 

rcpOI J55 S 9 HerC * thc matrix ' which cach unit contains the 
Ras2™ mutation is used to screen for compounds that 
restore the response profile to that of the matrix lacking thc 
mutation. 

Though these examples are directed lo the development of 
human therapeutics, informative response profiles can often 
be obtained in nonhuman reporter matrices. Hence, for 
disease causing genes with yeast homologs, even if the 
function of the gene is not known, a dominant form of the 
gene can be introduced into a yeast-based reporter matrix to 
identify disease state specific pathways for targeting. For 
exam £}?9 a re P° rter matrix comprising the yeast mutant 
Ras2 1 provides a discovery vehicle for pathways specific 
to thc human analog, the oncogene Ras2 va ' 12 . 

Application of Novel Combinatorial Chemistries 
with the Genome Reporter Matrix. 

Among the most important advances in drug development 
have been advances in combinatorial synthesis of chemical 
libraries. In conventional drug screening with purified 
enzyme targets, combinatorial chemistries can often help 
create new derivatives of a lead compound that will also 
inhibit the target enzyme but with some different and desir- 
able property. However, conventional methods would fail to 
recognize a molecule having a substantially divergent speci- 
ficity. The genome reporter matrix offers a simple solution to 
recognizing new specificities in combinatorial libraries. Spe- 
cifically, pools of new compounds are tested as mixtures 
across the matrix. If the pool has any new activity not 
present in the original lead compound, new genes are 
affected among the reporters. The identity of that gene 
provides a guide to the target of the new compound. Fur- 
thermore, the matrix offers an added bonus that compensates 
for a common weakness in most chemical syntheses. Spe- 
cifically, most syntheses produce the desired product in 
greatest abundance and a collection of other related products 
as contarninants due to side reactions in the synthesis. 
Traditionally the solution to contaminants is to purify away 
from them. However, the genome reporter matrix exploits 
the presence of these contarninants. Syntheses can be 
adjusted to make them less specific with a greater number of 
side reactions and more contaminants to determine whether 
anything in the total synthesis affects the expression of target 
genes of interest. If there is a component of the mixture with 
the desired activity on a particular reporter, that reporter can 
be used to assay purification of the desired component from 
the mixture. In effect, the reporter matrix allows a focused 
survey of the effect on single genes to compensate for thc 
impurity of the mixture being tested 
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Isoprcnoids arc a specially attractive class for the genome 
reporter matrix. In nature, isoprcnoids are the champion 
signaling molecules. Isoprcnoids are derivatives of the five 
carbon compound isoprene, which is made as an interme- 
diate in cholesterol biosynthesis. Isoprenoids include many 
of the most famous fragrances, pigments, and other biologi- 
cally active compounds, such as the antifungal sesquiterpe- 
noids, which plants use defensively against fungal infection 
There are roughly 10,000 characterized isoprene derivatives 
and many more potential ones. Because these compounds 
are used in nature to signal biological processes they are 
likely to include some of the best membrane 'permeam 
molecules. 

Isoprenes possess another characteristic that lends itself 
well to drug discovery through the genome reporter matrix. 
Pure isoprenoid compounds can be chemically treated to 
create a wide mixture of different compounds quickly and 
easily, due to the particular arrangement of double bonds in 
the hydrocarbon chains. In effect, isoprenoids can be 
mutagenized from one form into many different forms much 
as a wild-type gene can be mutagenized into many different 
mutants. For example, vitamin D used to fortify milk is 
produced by ultraviolet irradiation of the isoprene derivative 
known as ergosterol. New biologically active isoprenoids 
are generated and analyzed with a genome reporter matrix as 
follows. First a pure isoprenoid such as limonene is tested to 
aeterrnine its response profile across the matrix. Next, the 
isoprenoid (e.g. limonene) is chemically altered to create a 
mixture of different compounds. This mixture is then tested 
across the matrix. If any new responses are observed, then 
the mixture has new biologically active species. In addition 
the identity of the reporter genes provides information 
regarding what the new active species does, an activity to be 
used to monitor its purification, etc. This strategy is also 
applied to other mutable chemical families in addition to 35 
isoprenoids. 
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Applications of the Genome Reporter Matrix in 
Antibiotic and Antifungal Discovery. 

Fungi are important pathogens on plants and animals and 
make a major impact on the production of many food crops 
and on animal, including human; health. One major diffi- 
culty in the development of antifungal compounds has been 
the problem of finding pharmaceutical targets in fungi that 
are specific to the fungus. The genome reporter matrix offers 
a new tool to solve this problem. SpecificaUy, all molecules 
that fail to elicit any response in the Saccharomyces reporter 
are collected into a set, which by definition must be either 
inactive biologically or have a very high specificity A 
reporter library is created from the targeted pathogen such as 
Cryptococcus, Candida, Aspergillus, Pneumocystis etc All 
molecules from the set that do not affect Saccharomyces are 
tested on the pathogen, and any molecule that elicits an 
altered response profile in the pathogen in principle identi- 
fies a target that is pathogen-specific. As an example a 
pathogen may have a novel signaling enzyme, such as 'an 
inositol kinase that alters a position on the inositol ring that 
is not altered in other species. A compound that inhibits that 
enzyme would affect the signaling pathway in the pathogen 
and alter a response profile, but due to the absence of that 
enzyme in other organisms, would have no effect. By 
sequencing the reporter genes affected specifically in the 
target fungus and comparing the sequence with others in 
Genbank, one can identify biochemical pathways that are 
unique to the target species. Useful identified products 
include not only agents that kill the target fungus but also the 
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identification of specific targets in the fungus for other 
pharmaceutical screening assays. 

The identification of compounds that kill bacteria has 
been successfully pursued by the pharmaceutical industry 
for decades. It is rather simple to spot a compound that kills 
bacteria in a spot test on a petri plate. Unfortunately, growth 
inhibiuon screens have provided very limited lead com- 
pound diversity. However, there is much complexity to 
bacterial physiology and ecology that could offer an edge to 
development of combination therapies for bacteria, even for 
compounds that do not actually kill the bacterial cell 
Consider for example the bacteria that invade the urethra 
and persist there through the elaboration of surface attach- 
ments known as timbrae. Antibiotics in the urine stream 
have limited access to the bacteria because the urine stream 
is short-lived and infrequent. However, if one could block 
the synthesis of the timbrae to detach the bacteria, existing 
therapies would become more effective. Similarly if the 
chemotaxis mechanism of bacteria were crippled, the ability 
of bacteria to establish an effective infection would, in some 
species, be compromised. A genome reporter matrix for a 
bacterial pathogen that contains reporters for the expression 
of genes involved in chemotaxis or firnbrae synthesis as 
examples, identifies not only compounds that do kill 'the 
bacteria in a spot test, but also those that interfere with key 
steps in the biology of the pathogen. Tnese compounds 
would be exceedingly difficult to discover by conventional 
means. 

Applications of Human Cell Based Genome 
Reporter Matrices. 

A genome reporter matrix based on human cells provides 
many important applications. For example, an interesting 
application is the development of antiviral compounds 
When human cells are infected by a wide range of viruses 
the cells respond in a complex way in which only a few of 
the components have been identified For example, certain 
interferons are induced as is a double-stranded RNase Both 
of these responses individually provides some measure of 
protection. A matrix that reports the induction of interferon 
genes and the double stranded RNase is able to detect 
compounds that could prophylactically protect cells before 
the arrival of the vims. Other protective effects may be 
induced in parallel. The incorporation of a panel of other 
reporter genes in the matrix is used to identify those com- 
pounds with the highest degree of specificity. 

Use of the Genome Reporter Matrix. 

The procedure to be followed in the subject methods will 
now be outlined. The initial step involves determining the 
basal or background response profile by detecting reporter 
gene product signals from each of a plurality of different, 
separately isolated cells of a target organism under one or 
more of a variety of physical conditions, such as temperature 
and pH, medium, and osmolality. As discussed above, the 
target organism may be a yeast, animal model, human, plant, 
pathogen, etc. Generally, the cells are arranged in a physical 
matrix such as a microtiter plate. Each of the cetls contains 
a recombinant construct comprising a reporter gene opera- 
Uvely linked to a different endogenous transcriptional regu- 
latory element of said target organism such that said tran- 
scriptional regulatory element regulates the expression of 
said reporter gene. A sufficient number of different recom- 
binant cells are included to provide an ensemble of tran- 
scriptional regulatory elements of said organism sufficient to 
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model the transcriptional responsiveness of said organism to 
a drug. In a preferred embodiment, the matrix is substan- 
tially comprehensive for the selected regulatory elements, 
e.g. essentially all of the gene promoters of the targeted 
organism are included. Other cis-acting or trans-acting tran- 
scription regulatory regions of the targeted organism can 
also be evaluated. In one embodiment, a genome reporter 
matrix is constructed from a set of lacZ fusions to a 
substantially comprehensive set of yeast genes. The fusions 
are preferably constructed in a diploid cell of the a/a mating 
type to allow the introduction of dominant mutations by 
mating, though haploid strains also find use with particularly 
sensitive reporters for certain functions. The fusions arc 
conveniently arrayed onto a microtiter plate having 96 wells 
separating distinct fusions into wells having defined alpha- 
numeric X-Y coordinates, where each well (defined as a 
unit) confines a cell or colony of cells having a construct of 
a reporter gene operatively joined to a different transcrip- 
tional promoter. Permanent collections of these plates are 
readily maintained at -80° C. and copies of this collection 
can be made and propagated by simple mechanics and may 
be automated with commercial robotics. 

The methods involve detecting a reporter gene product 
signal for each cell of the matrix. A wide variety of reporters 
may be used, with preferred reporters providing conve- 
niently detectable signals (e.g. by spectroscopy). Typically, 
the signal is a change in one or more electromagnetic 
properties, particularly optical properties at the unit. As 
examples, a reporter gene may encode an enzyme which 
catalyzes a reaction at the unit which alters light absorption 
properties at the unit, radiolabeled or fluorescent tag-labeled 30 
nucleotides can be incorporated into nascent transcripts 
which are then identified when bound to oligonucleotide 
probes, etc. Examples include £-galactosidase, invertase, 
green fluorescent protein, etc. Invertase fusions have the 
virtue that functional fusions can be selected from complex 
libraries by the ability of invertase to allow those genes 
whose expression increases or decreases by measuring the 
relative growth on medium containing sucrose with or 
without the compound of interest. Electronic detectors for 
optical, radiative, etc. signals are commercially available, 40 
e.g. automated, multi-well colorimetric detectors, similar to 
automated ELISA readers. Reporter gene product signals 
may also be monitored as a function of other variables such 
as stimulus intensity or duration, time (for dynamic response 
analyses), etc. 45 

In a preferred embodiment, the basal response profiles are 
determined through the colorimetric detection of a lacZ 
reaction product. The optical signal generated at each well is 
detected and linearly transduced to generate a corresponding 
digital electrical output signal. The resultant electrical out- 
put signals are stored in computer memory as a genome 
reporter output signal matrix data structure associating each 
output signal with the coordinates of the corresponding 
microtiter plate well and the stimulus or drug. This infor- 
mation is indexed against the matrix to form reference 
response profiles that are used to determine the response of 
each reporter to any milieu in which a stimulus may be 
provided. 

After establishing a basal response profile for the matrix, 
each cell is contacted with a candidate drug. The term drug' 
is used loosely to refer to agents which can provoke a 
specific cellular response. Preferred drugs are pharmaceuti- 
cal agents, particularly therapeutic agents. The drug induces 
a complex response pattern of repression, silence and induc- 
tion across the matrix (i.e. a decrease in reporter activity at 
some units, an increase at others, and no change at still 
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others). The response profile reflects the cell's transcrip- 
tional adjustments to maintain homeostasis in the presence 
of the drug. While a wide variety of candidate drugs can be 
evaluated, it is important to adjust the incubation conditions 
(e.g. concentration, time, etc.) to preclude cellular stress, and 
hence insure the measurements of pharmaceutically relevant 
response profiles. Hence, the methods monitor transcrip- 
tional changes which the cell uses to maintain cellular 
homeostasis. Cellular stress may be monitored by any con- 
venient way such as membrane potential (e.g. dye exclu- 
sion), cellular morphology, expression of stress response 
genes, etc. In a preferred embodiment, the compound treat- 
ment is performed by transferring a copy of the entire matrix 
to fresh medium containing the first compound of interest. 

After contacting the cells with the candidate drug, the 
reporter gene product signals from each of said cells is again 
measured to determine a stimulated response profile. The 
basal of background response profile is then compared with 
(e.g. subtracted from, or divided into) the stimulated 
response profile to identify the cellular response profile to 
the candidate drug. The cellular response can be character- 
ized in a number of ways. For example, the basal profile can 
be subtracted from the stimulated profile to yield a net 
stimulation profile. In another embodiment, the stimulated 
profile is divided by the basal profile to yield an induction 
ratio profile. Such comparison profiles provide an estimate 
of the physiological specificity of the candidate drug. 

In another embodiment of the invention, a matrix of 
hybridization probes corresponding to a predetermined 
population of genes of the selected organism is used to 
specifically detect changes in gene transcription which result 
from exposing the selected organism or cells thereof to a 
candidate drug. In this embodiment, one or more cells 
derived from the organism is exposed to the candidate drug 
in vivo or ex vivo under conditions wherein the drug effects 
a change in gene transcription in the cell to maintain 
homeostasis. Thereafter, the gene transcripts, primarily 
mRNA, of the cell or cells is isolated by conventional 
means. The isolated transcripts or cDNAs complementary 
thereto are then contacted with an ordered matrix of hybrid- 
ization probes, each probe being specific for a different one 
of the transcripts, under conditions wherein each of the 
transcripts hybridizes with a corresponding one of the 
probes to form hybridization pairs. The ordered matrix of 
probes provides, in aggregate, complements for an ensemble 
of genes of the organism sufficient to model the transcrip- 
tional responsiveness of the organism to a drug. The probes 
are generally immobilized and arrayed onto a solid substrate 
such as a microtiter plate. Specific hybridization may be 
effected, for example, by washing the hybridized matrix 
with excess non-specific oligonucleotides. A hybridization 
signal is then detected at each hybridization pair to obtain a 
matrix-wide signal profile. A wide variety of hybridization 
signals may be used; conveniently, the cells are pre-Iabeled 
with radionucleotides such that the gene transcripts provide 
a radioactive signal that can be detected in the hybridization 
pairs. The matrix-wide signal profile of the drug-stimulated 
cells is then compared with a matrix-wide signal profile of 
negative control cells to obtain a specific drug response 
profile. 

The invention also provides means for computer-based 
qualitative analysis of candidate drugs and unknown com- 
pounds. A wide variety of reference response profiles may be 
generated and used in such analyses. For example, the 
response of a matrix to loss of function of each protein or 
gene or RNA in the cell is evaluated by introducing a 
dominant allele of a gene to each reporter cell, and deter- 
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mining the response of the reporter as a function of the 
mutation. For this purpose, dominant mutations are pre- 
ferred but other types of mutations can be used. Dominant 
mutations arc created by in vitro mutagenesis of cloned 
genes followed by screening in diploid cells for dominant 
mutant alleles. 

In an alternative embodiment, the reporter matrix is 
developed in a strain deficient for the UPF gene function 
wherein the majority of nonsense mutations cause a domi- 
nant phenotype, allowing dominant mutations to be con- 
structed for any gene. UPF1 encodes a protein that causes 
the degradation of MRNA's that, due to mutation, contain 
premature termination codons. In routants lacking UPF3 
function most nonsense mutations encode short truncated 
protein fragments. Many of these interfere with normal 
protein function and hence have dominant phenotypes. Thus 
in a upfl mutant, many nonsense alleles behave as dominant 
mutations (see, e.g. Leeds, P. et al. (1992) Molec. Cell 
Biology. 12:2165-77). 

The resultant data identify genetic response profiles 
These data are sorted by individual gene response to deter- 
mine the specificity of each gene to a particular stimulus. A 
weighting matrix is established which weights the signals 
proportionally to the specificity of the corresponding report- 
ers. The weighting matrix is revised dynamically, incorpo- 
rating data from every screen. A gene regulation function is 
then used to construct tables of regulation identifying which 
cells of the matrix respond to which mutation in an indexed 
gene, and which mutations affect which cells of the matrix. 

Response profiles for an unknown stimulus (e.g. new 
chemicals, unknown compounds or unknown mixtures) may 
be analyzed by comparing the new stimulus response pro- 
files with response profiles to known chemical stimuli. Such 
comparison analyses generally take the form of an indexed 
report of the matches to the reference chemical response 
profiles, ranked according to the weighted value of each 
matching reporter. If there is a match (i.e. perfect score), the 
response profile identifies a stimulus with the same target as 
one of the known compounds upon which the response 
profile database is built If the response profile is a subset of 40 
cells in the matrix stimulated by a known compound, the 
new compound is a candidate for a molecule with greater 
specificity than the reference compound. In particular, if the 
reporters responding uniquely to the reference chemical 
have a low weighted response value, the new compound is 
concluded to be of greater specificity. Alternatively, if the 
reporters responding uniquely to the reference compound 
have a high weighted response value, the new compound is 
concluded to be active downstream in the same pathway If 
the output overlaps the response profile of a known refer- 
ence compound, the overlap is sorted by a quantitative 
evaluation with the weighting matrix to yield common and 
unique reporters. Hie unique reporters are then sorted 
against the regulation tables and best matches used to 
deduce the candidate target. If the response profile does not 
either overlap or match a chemical response profile, then the 
database is inadequate to infer function and the response 
profile may be added to the reference chemical response 
profiles. 

The response profile of a new chemical stimulus may also 
be compared to a known genetic response profile for target 
gene(s). If there is a match between the two response 
profiles, the target gene or its functional pathway is the 
presumptive target of the chemical. If the chemical response 
profile is a subset of a genetic response profile, the target of 
the drug is downstream of the mutant gene but in the same 
pathway. If the chemical response profile includes as a 
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subset a genetic response profile, the target of the chemical 
is deduced to be in the same pathway as the target gene but 
upstream and/or the chemical affects additional cellular 
components. If not, the chemical response profile is novel 
and defines an orphan pathway. 

While described in terms of cells comprising reporters 
under the transcriptional control of endogenous regulatory 
regions, there are a number of other means of practicing the 
invention. For example, each unit of a genome reporter 
matrix reporting on gene expression might confine a differ- 
ent oligonucleotide probe capable of hybridizing with a 
corresponding different reporter transcript Alternatively, 
each unit of a matrix reporting on DNA-protein interaction 
might confine a cell having a first construct of a reporter 
gene operatively joined to a targeted transcription factor 
binding site and a second hybrid construct encoding a 
transcription activation domain fused to a different structural 
gene, i.e. a one-dimensional one-hybrid system matrix 
Alternatively, each unit of a matrix reporting on protein- 
protein interactions might confine a cell having a first 
construct of a reporter gene operatively joined to a targeted 
transcription factor binding site, a second hybrid construct 
encoding a transcription activation domain fused to a dif- 
ferent constitutionally expressed gene and a third construct 
encoding a DNA-binding domain fused to yet a different 
constitutionally expressed gene, Le. a two-dimensional two- 
hybrid system matrix. 

The following examples are offered by way of illustration 
and not by way of limitation. 
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EXAMPLES 
I. Transcriptional promoter-reporter gene matrix 
A) Construction of a physical matrix stimulated with the 
drug mevinolin (lovastatin, Meracon). 

Mevinolin is a compound known to inhibit cholesterol 
biosynthesis. Initially, the maximal non-toxic (as measured 
by cell growth and viability) concentration of mevinolin on 
the reporter cells was determined by serial dilution to be 25 
ug/ml. To produce a mevinolin-stimulated matrix, each well 
of 60 microtiter plates is filled with 100 ul culture medium 
containing 25 ug/ml mevinolin in a 2% ethanol solution. An 
aliquot of each member of the reporter matrix is added to 
each well allowing for a dilution of approximately 1:100 
The cells are incubated in the medium until the turbidity of 
the average reporter increases by 20 fold. Each well is then 
quantified for turbidity as a measure of growth, and is treated 
with a lysis solution to allow measurement of fl-galactosi- 
dase from each fusion. 

B) Generation of an output signal matrix data structure. 
Both the turbidity and the B-galactosidase are read on 

commercially available microtiter plate readers (e.g. Bio- 
Rad) and the data captured as an ASCII file. From this file 
the value of the individual cells in the reporter matrix to a 
2% ethanol solution in the reference response profile is 
subtracted. The difference corresponds to the mevinolin 
response profile. This file is converted in the computer to a 
table indexed by the response of each cell to the inhibitor. 
For example, the genes encoding acetoacetyl-CoA thiolase 
and squalene synthase increase 10 fold, while SIR3, and 
LEU2, two unrelated genes, remain unchanged/ The 
response of the reporter matrix to other compounds is 
similarly determined and stored as output response profiles. 

C) Comparison of Signal Matrix data structure with a 
Signal Matrix database. 
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A physical matrix is constructed as describe above except 
the mevinolin is replaced with an unknown test compound. 
The resultant response profile is compared to the response 
profiles of a library of known bioactive compounds and 
analyzed as described above. For example, if the test com- 
pound output profile shows both acetoacetyl-CoA thiolase 
and squalene synthase gene induced, then the output profile 
matches that expected of an inhibitor of cholesterol synthe- 
sis. If the response profile has fewer other cells affected than 
the response profile to mevinolin, the unknown compound is 
a candidate for greater specificity. If the response profile of 
the new chemical affects fewer other reporters than the 
response profile to mevinolin, and if the other reporters 
affected by mevinolin have a lower weighted value, then the 
compound is a candidate for greater specificity. If the 
response profile has more different cells affected than the 
response profile to mevinolin, then the compound is a 
candidate for less specificity. In the case where mixtures of 
compounds are tested, the highest weighted responses are 
evaluated to determine whether they can be deconvoluted 
into the response profile of two different compounds, or of 
two different genetic response profiles. 

2. Reporter transcript-oligonucleotidc hybridization probe 
matrix: Construction of stimulated physical matrix and 
generation of an output signal matrix data structure. 

Unlabeled oligonucleotide hybridization probes comple- 
mentary to the mRNA transcript of each yeast gene are 
arrayed on a silicon substrate etched by standard techniques 
(e g. Fodor et al. (1991) Science 252, 767). The probes are 
of length and sequence to ensure specificity for the corre- 
sponding yeast gene, typically about 24-240 nucleotides in 
length. 

A confluent HcLa cell culture is treated with 15 ug/ml 
mevmolin in 2% ethanol for 4 hours while maintained in a 
humidified 5% C0 2 atmosphere at 37° C. Messenger RNA 
is extracted, reverse transcribed and fluorophore-labeled 
according to standard methods (Sambrook et al., Molecular 
Cloning, 3rd ed.). The resultant cDNA is hybridized to the 
array of probes, the array is washed free of unhybridized 
labeled cDNA, the hybridization signal at each unit of the 
array quantified using a confocal microscope scanner 
(instruments by Molecular Devices and Anymetrix), and the 
resultant matrix response data stored in digital form. 
3. Two-dimensional two-hybrid matrix 
A) Construction of stimulated physical matrix. 
The two-dimensional two-hybrid (see, e e Chien et nl 
(1991) PNAS, 88, 9578)matrix is designed to screen for 
compounds that specifically affect the interaction of two 
proteins, e.g. the interaction of a human signal transducer 
and activator of transcription (STAT) with an interleukin 
receptor. Two hybrid fusions are generated by standard 
methods: each strain contains a portion of the targeted 
human STAT gene, fused to a portion of a yeast or bacterial 
gene encoding a DNA binding domain (e g GAL4-1-147) 
The DNA sequence recognized by that DNA binding domain 
(e.g UAS 0 ) is inserted m place of the enhancer sequence 5' 
to the selected reporter (e.g. lacZ). The strain also contains 
another fusion consisting of an intracellular portion of the 

S rg oi^^ eceptor gene whose P™ 1 *" P^ct interacts with 
the 5>TAT. This receptor gene is fused with a gene fragment 

?J??r3L a ^scnptonal activation domain (ee 
GAL4:768-881). 

B) Generation of signal matrix data structure. 
Both the turbidity and the galactosidase are read on 

commercial microliter plate readers (BioRad) and the data 
captured as an ASCII file. 

C) Comparison of signal matrix data structure with data- 
base. 
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Data are analyzed for those compounds that block the 
interaction of the two human proteins by reducing the signal 
produced from the reporter in the various strains containing 
pairs of human proteins. The output is processed to identify 
compounds with a large impact on a reporter whose expres- 
sion is dependent on a single pair of interacting human 
proteins. An inverted weighting matrix is used to evaluate 
these data as preferred compounds do not affect even the 
least specific reporters in the matrix. 

All publications and patent applications cited in this 
specification are herein incorporated by reference as if each 
individual publication or patent application were specifically 
and individually indicated to be incorporated by reference 
Although the foregoing invention has been described in 
some detail by way of illustration and example for purposes 
of clanty of understanding, it will be readily apparent to 
those of ordinary skill in the art in light of the teachings of 
this invention that certain changes and modifications may be 
made thereto without departing from the spirit or scope of 
the appended claims. 
What is claimed is: 

1. A method for modeling of the transcriptional respon- 
siveness of an organism to a candidate drug which has an 
effect on gene transcription in cells of said organism, com- 
prising steps: 

(a) detecting reporter gene product signals from each of a 
plurality of different, separately isolated cells of a target 
organism, wherein each of said cells contains a recom- 
binant construct comprising a reporter gene operati vely 
linked to a different endogenous transcriptional regu- 
latory element of said target organism such that said 
transcriptional regulatory element regulates the expres- 
sion of said reporter gene, wherein said plurality of 
cells comprises an ensemble of the transcriptional 
regulatory elements of said organism sufficient to 
model the transcriptional responsiveness of said organ- 
ism to a drug; 

(b) contacting each of said cells with a candidate drug 
under conditions, wherein said cells maintain homeo- 
stasis; 

(c) detecting reporter gene product signals from each of 
said cells; 

(d) comparing said reporter gene product signals from 
each of said cells before and after contacting each of 
said cells with said candidate drug to obtain a drug 
response profile; 

wherein said drug response profile provides a model of 
the transcriptional responsiveness of said organism to 
said candidate drug. 
2. A method according to claim 1, said ensemble com- 
prising a majority of all different transcriptional regulatory 
elements of said organism. 

3 A method according to claim 1, said drug being a 
candidate human therapeutic. 

4. A method according to claim 1, wherein said cells are 
yeast cells. 

5. A method according to claim 1, wherein said cells are 
bacterial cells. 

6. A method according to claim 1, wherein said cells are 
human cells. 

7. A method according to claim 1, wherein the reporter 
gene is the lacZ gene, the suc2 gene, or a gene encoding a 
green fluorescent protein. 

8. A method according to claim 1, wherein said cells are 
eukaryotic cells. 
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Contributed by Ronald W. Davis, December 27, 1996 

ABSTRACT cDNA microarray technology is used to profile 
complex diseases and discover novel disease-related genes. In 
inflammatory disease such as rheumatoid arthritis, expression 
patterns of diverse cell types contribute to the pathology. We 
have monitored gene expression in this disease state with a 
microarray of selected human genes of probable significance in 
inflammation as well as with genes expressed in peripheral 
human blood cells. Messenger RNA from cultured macrophages, 
chondrocyte cell lines, primary chondrocytes, and synoviocytes' 
provided expression profiles for the selected cytokines, chemo- 
kines, DNA binding proteins, and matrix-degrading metal- 
loproteinases. Comparisons between tissue samples of rheuma- 
toid arthritis and inflammatory bowel disease verified the in- 
volvement of many genes and revealed novel participation of the 
cytokine interleukin 3, chemokine Grot* and the metal- 
loproteinase matrix metallo-elastase in both diseases. From the 
peripheral blood library, tissue inhibitor of metalloproteinase 1, 
ferritin light chain, and manganese superoxide dismutase genes 
were identified as expressed differentially in rheumatoid arthri- 
tis compared with inflammatory bowel disease. These results 
successfully demonstrate the use of the cDNA microarray system 
as a general approach for dissecting human diseases. 

The recently described cDNA microarray or DNA-chip tech- 
nology allows expression monitoring of hundreds and thou- 
sands of genes simultaneously and provides a format for 
identifying genes as well as changes in their activity (1, 2). 
Using this technology, two-color fluorescence patterns of 
differential gene expression in the root versus the shoot tissue 
of Arabidopsis were obtained in a specific array of 48 genes (1) 
In another study using a 1000 gene array from a human 
peripheral blood library, novel genes expressed by T cells were 
identified upon heat shock and protein kinase C activation (3) 
The technology uses cDNA sequences or cDNA inserts of a 
library for PCR amplification that are arrayed on a glass slide with 
high speed robotics at a density of 1000 cDNA sequences per cm 2 
These microarrays serve as gene targets for hybridization to 
cDNA probes prepared from RNA samples of cells or tissues A 
two-color fluorescence labeling technique is used in the prepa- 
ration of the cDNA probes such that a simultaneous hybridization 
but separate detection of signals provides the comparative anal- 
ysis and the relative abundance of specific genes expressed (1 2) 
Microarrays can be constructed from specific cDNA clones of 
interest, a cDNA library, or a select number of open reading 
frames from a genome sequencing database to allow a large-scale 
functional analysis of expressed sequences. 
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Because of the wide spectrum of genes and endogenous 
mediators involved, the microarray technology is well suited 
for analyzing chronic diseases. In rheumatoid arthritis (RA) 
inflammation of the joint is caused by the gene products of 
many different cell types present in the synovium and cartilage 
tissues plus those infiltrating from the circulating blood. The 
autoimmune and inflammatory nature of the disease is a 
cumulative result of genetic susceptibility factors and multiple 
responses, paracrine and autocrine in nature, from macro- 
phages, T cells, plasma cells, neutrophils, synovial fibroblasts 
chondrocytes, etc. Growth factors, inflammatory cytokines 
(4) and the chemokines (5) are the important mediators of this 
inflammatory process. The ensuing destruction of the cartilage 
and bone by the invading synovial tissue includes the actions 
of prostaglandins and leukotrienes (6), and the matrix degrad- 
ing metalloproteinases (MMPs). The MMPs are an important 
class of Zn-dependent metallo-endoproteinases that can col- 
lectively degrade the proteoglycan and collagen components of 
the connective tissue matrix (7). 

This paper presents a study in which the involvement of 
select classes of molecules in RA was examined. Also inves- 
tigated were 1000 human genes randomly selected from a 
peripheral human blood cell library. Their differential and 
quantitative expression analysis in cells of the joint tissue in 
diseased RA tissue and in inflammatory bowel disease (IBD) 
tissues was conducted to demonstrate the utility of the mi- 
croarray method to analyze complex diseases by their pattern 
of gene expression. Such a survey provides insight not only into 
the underlying cause of the pathology, but also provides the 
opportunity to selectively target genes for disease intervention 
by appropriate drug development and gene therapies. 

METHODS 

Microarray Design, Development, and Preparation. Two ap- 
proaches for the fabrication of cDNA microarrays were used in 
this study. In the first approach, known human genes of probable 
significance in RA were identified. Regions of the clones pref- 
erably 1 kb in length, were selected by their proximity to the 3' end 
of the cDNA and for areas of least identity to related and 
repetitive sequences. Primers were synthesized to amplify the 
target regions by standard PCR protocols (3). Products were 

Abbreviations: RA, rheumatoid arthritis; MMP, matrix-deeradine 
metalloproteinase; IBD, inflammatory bowel disease; LPS, lipopoly- 
sacchande; PMA, phorbol 12-myristate 13-acetate; TNF-a tumor 
Trn^ QtOX a \ 1L ' inter ! eukin ; TGF-0, transforming growth factor 
ft GCSF, granulocyte colony-stimulating factor; MIP. macrophage 
inflammatory protein; MIF, migration inhibitory factor; HME, human 
matrix metallo-elastase; RANTES, regulated upon activation, normal 
T cell expressed and secreted; Gel, gelatinase; VCAM, vascular cell 
adhesion molecule; ICE, IL-1 converting enzyme; PUMP, putative 
metalloproteinase; MnSOD, manganese superoxide dismutase- TIMP 
tissue inhibitor of metalloproteinase; MCP, macrophage chemotactic 
protein. 

Ho whom reprint requests should be sent at the present address- 
Roche Bioscience, S3-1, 3401 Hillview Avenue, Palo Alto, CA 94304. 
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verified by gel electrophoresis and purified with Qiaquick 96-well 
- -purification kit (Oiagen; ehatsworth; €A), lyophllized (Savant), 
and resuspended in 5 m1 of 3x standard saline citrate (SSC) buffer 
for arraying. In the second approach, the microarray containing 
the 1056 human genes from the peripheral blood lymphocyte 
library was prepared as described (3). 

Tissue Specimens. Rheumatoid synovial tissue was obtained 
from patients with late stage classic RA undergoing remedial 
synovectomy or arthroplasty of the knee. Synovial tissue was 
separated from any associated connective tissue or fat. One 
gram of each synovial specimen was subjected to RNA extrac- 
tion within 40 min of surgical excision, or explants were 
cultured in serum-free medium to examine any changes under 
in vitro conditions. For IBD, specimens of macroscopically 
inflamed lower intestinal mucosa were obtained from patients 
with Crohn disease undergoing remedial surgery. The hyper- 
trophied mucosal tissue was separated from underlying con- 
nective tissue and extracted for RNA. 

Cultured Cells. The Mono Mac-6 (MM6) monocytic cells 
(8) were grown in RPMI medium. Human chondrosarcoma 
SW1353 cells, primary human chondrocytes, and synoviocytes 
(9, 10) were cultured in DMEM; all culture media were 
supplemented with 10% fetal bovine serum, 100 /xg/ml strep- 
tomycin, and 500 units/ml penicillin. Treatment of cells with 
hpopolysaccharide (LPS) endotoxin at 30 ng/ml, phorbol 
12-myristate 13-acetate (PMA) at 50 ng/ml, tumor necrosis 
factor a (TNF-a) at 50 ng/ml, interleukin (IL)-l/3 at 30 ng/ml 
or transforming growth factor-jS (TGF-/3) at 100 ng/ml is 
described in the figure legends. 
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Fluorescent Probe, Hybridization, and Scanning. Isolation of 
mRNA, probe preparation, and quantitation with Arabidopsis 
control mRNAs was essentially as described (3) except for the 
following minor modification. Following the reverse transcriptase 
step, the appropriate Cy3- and Cy5-labeled samples were pooled- 
mRNA degraded by heating the sample to 65°C for 10 min with 
the addition of 5 M l of 0.5M NaOH plus 0.5 ml of 10 mM EDTA 
The pooled cDNA was purified from unincorporated nucleotides 
by gel filtration in Centri-spin columns (Princeton Separations, 
Adelphia, NJ). Samples were lyophilized and dissolved in 6 ul of 
hybridization buffer (5x SSC plus 0.2% SDS). Hybridizations 
washes, scanning, quantitation procedures, and pseudocolor rer> 
resentations of fluorescent images have been described (3). Scans 
for the two fluorescent probes were normalized either to the 
fluorescence intensity of Arabidopsis mRNAs spiked into the 
labeling reactions (see Figs. 2-4) or to the signal intensity of 
/3-actin and giyceraldehyde-3-phosphate dehydrogenase 
(GAPDH; see Fig. 5). 

RESULTS 

Ninety-Six-Gene Microarray Design. The actions of cytokines, 
growth factors, chemokines, transcription factors, MMPs, pros- 
taglandins, and ieukotrienes are well recognized in inflammatory 
disease, particularly RA (11-14). Fig. 1 displays the selected genes 
for this study and also includes control cDNAs of housekeeping 
genes such as /3-actin and GAPDH and genes from Arabidopsis 
for signal normalization and quantitation (row A, columns 1-12). 

Defining Microarray Assay Conditions. Different lengths and 
concentrations of target DNA were tested by arraying PCR- 
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amplified products ranging from 0.2 to 1.2 kb at concentrations 
-of less; No significanttlifferente in the signal levels was 

observed within this range of target size and only with 0.2-kb 
length was a signal reduced upon an 8-fold dilution of the 1 Mg/uJ 
sample (data not shown). In this study the average length of the 
targets was 1 kb, with a few exceptions in the range of ~300 bp, 
arrayed at a concentration of 1 /ig/^l. Normally one PCR pro- 
vided sufficient material to fabricate up to 1000 microarray targets. 

In considering positional effects in the development of the 
targets for the microarrays, selection was biased toward the 3' 
proximal regions, because the signal was reduced if the target 
fragment was biased toward the 5' end (data not shown). This 
result was anticipated since the hybridizing probe is prepared by 
reverse transcription with oligo(dT)-primed mRNA and is richer 
in 3' proximal sequences. Cross-hybridizations of probes to 
targets of a gene family were analyzed with the matrix metal- 
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loproteinases as the example because they can show regions of 
sequence identities of greater than 70%. With collagenase-1 
(Col-1) and collagenase-2 (C51-2) genes as targets with up to 70% 
sequence identity, and stromelysin-1 (Strom-1) and stromelysin-2 
(Strom-2) genes with different degrees of identity, our results 
showed that a short region of overlap, even with 70-90% se- 
quence identity, produced a low level of cross-hybridization. 
However, shorter regions of identity spread over the length of the 
target resulted in cross-hybridization (data not shown). For 
closely related genes, targets were designed by avoiding long 
stretches of homology. For members of a gene family two or more 
target regions were included to discriminate between specificity 
of signal versus cross-hybridization. 

Monitoring Differential Expression in Cultured Cell Ones. In 
RA tissue, the monocyte/macrophage population plays a prom- 
inent role in phagocytic and immunomodulatory activities. Typ- 
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ically these cells, when triggered by an immunogen, produce the 
- -proinflammatroy cytokines INF and H>1. We~have used the 
monocyte cell line MM6 and monitored changes in gene expres- 
sion upon activation with LPS endotoxin, a component of Gram- 
negative bacterial membranes, and PMA, which augments the 
action of LPS on TNF production (15). RNA was isolated at 
different times after induction and used for cDNA probe prep- 
aration. From this time course it was clear that TNF expression 
was induced within 15 min of treatment, reached maximum levels 
in 1 hr, remained high until 4 hr and subsequently declined (Fig. 
24). Many other cytokine genes were also transiently activated, 
such as IL-la and -)3, IL-6, and granulocyte colony-stimulating 
factor (GCSF). Prominent chemokines activated were IL-8, mac- 
rophage inflammatory protein (MIP)-lj3, more so than MIP-la, 
and Groa or melanoma growth stimulatory factor. Migration 
inhibitory factor (MIF) expressed in the uninduced state declined 
in LPS-activated cells. Of the immediate early genes, the notice- 
able ones were c-fos,fra-l, c-jun, NF-KBp50, and IkB, with c-re/ 
expression observed even in the uninduced state (Fig. 25). These 
expression patterns are consistent with reported patterns of 
activation of certain LPS- and PMA-induced genes (12). Dem- 
onstrated here is the unique ability of this system to allow parallel 
visualization of a large number of gene activities over a period of 
time. 

SW1353 cells is a line derived from malignant tumors of the 
cartilage and behaves much like the chondrocytes upon stim- 
ulation with TNF and IL-1 in the expression of MMPs (9). In 
addition to confirming our earlier observations with Northern 
blots on Strom-1, Col-1, and Col-3 expression (9), gelatinase 
(Gel) A, putative metalloproteinase (PUMP)-l membrane- 
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Fig. 3. Time course for IL-10 and TNF-induced SW1353 cells 
using the inflammation array (Fig. 1). (A) Pseudocolor representation 
of fluorescent scans correspond to gene expression levels at each time 
point. (B 1-1V) Relative levels of selected genes at different time points 
compared with time zero. 
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type matrix metalloproteinase, tissue inhibitors of matrix 
metallo'proteinases or tissue inhibitor of metalloproteinase 1 
(TIMP-1), -2, and -3 were also expressed by these cells together 
with the human matrix metallo-elastase (HME; Fig. M). HME 
induction was estimated to be ^50-fold and was greater than 
any of the other MMPs examined (Fig. 35). This result was 
unexpected because HME is reportedly expressed only by 
alveolar macrophage and placental cells (16). Expression of 
the cytokines and chemokines, IL-6, IL-8, MIF, and MIP-1/3 
was also noted. A variety of other genes, including certain 
transcription factors, were also up-regulated (Fig. 3), but the 
overall time-dependent expression of genes in the SW1353 
cells was qualitatively distinct from the MM6 cells. 

Quantitation of differential gene expression (Figs. IB and 
3B) was achieved with the simultaneous hybridization of 
Cy3-labeled cDNA from untreated cells and Cy5-labeled 
cDNA from treated samples. The estimated increases in 
expression from these microarrays for a select number of genes 
including IL-1/3, IL-8, MIP-1/3, TNF, HME, Col-1, Col-3, 
Strom-1, and Strom-2 were compared with data collected from 
dot blot analysis. Results (not shown) were in close agreement 
and confirmed our earlier observations on the use of the 
microarray method for the quantitation of gene expression (3). 

Expression Profiles in Primary Chondrocytes and Synovio- 
cytes of Human RA Tissue, Given the sensitivity and the 
specificity of this method, expression profiles of primary 
synoviocytes and chondrocytes from diseased tissue were 
examined. Without prior exposure to inducing agents, low level 
expression of c-jun, GCSF, IL-3, TNF-/3, MIF, and RANTES 
(regulated upon activation, normal T cell expressed and se- 
creted) was seen as well as expression of MMPs, GelA, 
Strom-1, Col-1, and the three TIMPs. In this case, Col-2 
hybridization was considered to be nonspecific because the 
second Col-2 target taken from the 3' end of the gene gave no 

A. Human synovial fibroblasts B. Human articular chondrocytes 




Fig. 4. Expression profiles for early passage primary synoviocytes and 
chondrocytes isolated from RA tissue, cultured in the presence of 10% 
fetal calf serum and activated with PMA and IL-1/3, or TNF and IL-1/3, 
or TGF-/3 for 18 hr. The color bars provide a comparative calibration scale 
between arrays and are derived from the Arabidopsis mRNA samples that 
are introduced in equal amounts during probe preparation 
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am£ irT vr A h M e r- In IBD " A ' ° ne ° f three indiv 'iuS 
samples, ICE VCAM, Groa, and MMP expression was more 
pronounced than in the others. 

to^Pn! S £ made USC ° f 3 Pe^P^ Wood cDNA library (3) 
to identify genes expressed by lymphocytes infiltrating the 
inflamed tissues from the circulating blood. With the 1046 
element array of randomly selected cDNAs fri th s library" 

between the two disease tissues while others were differentially 
expressed (data not shown). A complete survey of these genes 
was beyond the scope of this study, but for this reoort we 

relative to IBD. These cDNAs were sequenced and identified 
by comparison to the GenBank datable. They are TIMP i 

CMniOD). Differential expression of MnSOD was only ob- 
served in samples of RA tissue explants maintained ^g owth 
medium w.thout serum for anywhere between 2 to 16 hr^e e 
results also md.cate that the expression profile of genes can be 
altered when explants are transferred to culture conditions 

DISCUSSION 

SL?„ e , ed .' CaSe ' and feasibilit y of simultaneously monitoring 
d.fferent.al express.on of hundreds of genes with the cDNA 

aZvT 3 ? baS6d P 16 " 1 (1 ~ 3 > is "emlnstrated h "e 
analysis of a complex disease such as RA. Many different cell 
types in the RA tissue; macrophages, lymphocytes, plasma eel 
h T", 0 ^ 5 ' chondr ^, etc. are known ™ 2t 
tribute to the development of the disease with the expression of 
gene products known to be proinflammatory. TheyTncS the 

Zr%1i e Z? m f?r h faCtOTS ' MMPs ' ^sanoS and 
others (7, 11-14) and the design of the 96-element known eene 
microarray was based on this knowledge and dependeTo^he 
ava.labil.ty of the genes. The technology was vaEd by "on 

1 T 8 „, earIl 1 ?r 0b r VationS 0n ,he «P™n of 1W by Ae 
monocyte cell line MM6, and of Col-1 and Col-3 expression in the 
chondrosarcoma cells and articular chondrites (9, ^2) In olr 
K < T t SUrV 7 thC chron °'°8ical order of gene aclvife 
in and between gene families was compared and the results X£ 

n i rr< : p P T^ ,ed Pr ° fiIeS ° f ^ cytokines (WS!-? 
IL-6 GCSF, and MIF), chemokines (MIP-L, MIP-1B BW and 
Gro- ), certain transcription factors, and the mSix meS 
loproteinases (GeL^ Strom-1 Col-1 rv.1 ^ uun • ™ u 1 

j£r V!?* 01 *? Cy,0kine P rodu «i°n in the diseaSte had 
established a model in which TNF is a major participant in RA 
Its expression reportedly preceded that of the other ^cytokines ar^ 
effector molecules (4). Our results strongly support C r S 
as demonstrated in the time course of the MM^ s where TW 
mduction preceded that of IL-la and IL-,3 followed by IlS 
GCSF. These expression profiles demonstrate the utility of the 

In thelwnstfT'" 8 ^ """^ of paling even t^ 
Timp. chondrosarcoma cells, all the known MMPs and 

TIMPs were exam.ned simultaneously. HME expression was 
discovered wh.ch previously had been observed* n onTy the 
stromal cells and alveolar macrophages of smoker's lungs and in 
mSh 1 t,SSUe - ItS PreS6nCe in ce,,s of the RA tissue is mean 

Snandb^V 15 T^k"" ^ Significant destru t^n 
elast n and basement membrane components (16, 17) Exoression 
profiles o synovial fibroblasts and articular chond oSKre 
remarkably similar and not too different from the SW? 35 1 cell? 

if mg '"I 1 ,he fibroblast and chondrocyteLnplafeqS 
aggressive roles ,n joint erosion. Prominent genes expressed were 
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the MMPs, but chemokines and cytokines were also produced by 
_ . . these-cells^The-effect of the anabolicVowth factor TGF-/3 was 
profoundly evident in demonstrating the down regulation of these 
catabolic activities. 

RA tissue samples undeniably reflected profiles similar to 
the cell types examined. Active genes observed were IL-3 IL-6 
ICE, the MMPs including HME and TIMPs, chemokines IL-8 
wofw M E' MIF ' and RANT£ S, and the adhesion molecule 
VCAM. Of the growth factors, fibroblast growth factor Q was 
observed most frequently. In comparison, the expression 
patterns in the other inflammatory state (i.e., IBD) were not 
as marked as in the RA samples, at least as obtained from the 
tissue samples selected for this study. 

te an alternative approach, the 1046 cDNA microarray of 
randomly selected genes from a lymphocyte library was used to 
identify genes expressed in RA tissue (3). Many genes on this 
array hybridized with probes made from both R A and IBD tissue 
samples. The results are not surprising because inflammatory 
tissue is abundantly supplied with cell types infiltrating from the 
circulating blood, made apparent also by the high levels of 
chemokine expression in RA tissue. Because of the magnitude of 
the effort required to identify all the hybridized genes, we have for 
this report chosen to describe only three differentially expressed 
genes mainly to verify this method of analysis. 

Of the large number of genes observed here, a fair number 
were already known as active participants in inflammatory dis- 
ease^ These are TOP, IL-1, IL-6, IL-8, GCSF, RANTES, and 
uur it ,^ Vel Participants not previously reported are 
riMfc, ICE, and Groa. With our discovery of HME 

expression in RA, this gene becomes a target for drug interven- 
tion. ICE is a cysteine protease well known for its IL-ljS process- 
ing activity (18), and recognized for its role in apoptotic cell death 
(19). Its expression in RA tissue is intriguing. IL-3 is recognized 
tor its growth-promoting activity in hematopoietic cell lineages is 
a product of activated T cells (20), and its expression in synovio- 
cytes and chondrocytes of RA tissue is a novel observation 

Like IL-8, Groa, is a C-X-C subgroup chemokine and is a 
potent neutrophil and basophil chemoattractant. It down- 
^ ^ ^ ex P fession of types I and III interstitial collagens 
(21, 22) and is seen here produced by the MM6 cells, in primary 
synoviocytes, and in RA tissue. With the presence of RANTES 
MCP, and MIP-1/3, the C-C chemokines (23) migration and 
infiltration of monocytes, particularly T cells, into the tissue is 
also enhanced (5) and aid in the trafficking and recruitment of 
leukocytes into the RA tissue. Their activation, phagocytosis 
degranulation, and respiratory bursts could be responsible for 
t^ induction of MnSOD in RA. MnSOD is also induced by 
INF and IL-1 and serves a protective function against oxida- 
tive damage. The induction of the ferritin light chain encoding 

E"^ 1 !? 8 ti5SUe may be for reasons simiIar to those for 
MnSOD. Ferritin is the major intracellular iron storage protein 
and it is responsive to intracellular oxidative stress and reactive 
oxygen intermediates generated during inflammation (24 25) 

^ le i^n Ve . eXpreSSion of TIMP " 1 in RA tissue > ^ detected by 
tne lUOO-element array, is no surprise because our results have 
repeatedly shown TIMP-1 to be expressed in the constitutive 
and induced states of R A cells and tissues 

The suitability of the cDNA microarray technology for 
profiling diseases and for identifying disease related eenes is 
well documented here. This technology could provide new 
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targets for drug development and disease therapies, and in 
doing so allow for improved treatment of chronic diseases that 
are challenging because oTlheir complexity. 
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MEASUREMENT OF GENE EXPRESSION PROFILES 
IN TOXICITY DETERMINATION 

Field of the Invention 
The invention relates generally to methods for detecting and monitoring 
phenotypic changes in in vitro and in vivo systems for assessing and/or determining 
the toxicity of chemical compounds, and more particularly, the invention relates to a 
method for detecting and monitoring changes in gene expression patterns in in vitro 
and in vivo systems for determining the toxicity of drug candidates. 

BACKGROUND 

The ability to rapidly and conveniently assess the toxicity of new compounds 
is extremely important. Thousands of new compounds are synthesized every year, 
and many are introduced to the environment through the development of new 
commercial products and processes, often with little knowledge of their short term 
and long term health effects. In the development of new drugs, the cost of assessing 
the safety and efficacy of candidate compounds is becoming astronomical: It is 
.estimated that the pharmaceutical industry spends an average of about 300 million 
dollars to bring a new pharmaceutical compound to market, e.g. Biotechnology. 13: 
226-228 (1995). A large fraction of these costs are due to the failure of candidate 
compounds in the later stages of the developmental process. That is. as the 
assessment of a candidate drug progresses from the identification of a compound as a 
drug candidate-for example, through relatively inexpensive binding assays or in vitro 
screening assays, to pharmacokinetic studies, to toxicity studies, to efficacy studies in 
model systems, to preliminary clinical studies, and so on, the costs of the associated 
tests and analyses increases tremendously. Consequently, it may cost several tens of 
millions of dollars to determine that a once promising candidate compound possesses 
a side effect or cross reactivity that renders it commercially infeasibie to develop 
further. A great challenge of pharmaceutical development is to remove from further 
consideration as early as possible those compounds that are likely to fail in the later 
stages of drug testing. 

Drug development prograrrs are clearly structured with this objective in mind: 
however, rapidly escalating costs have created a need to develop even more stringent 
and less expensive screens in the early stages to identify false leads as soon as 
possible. Toxicity assessment is an area where such improvements may be made, for 
both drug development and for assessing the environmental, health, and safety effects 
of new compounds in general. 
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products. " While "such approaches areattractive, those based on exhaustive, or even 
sampled, sequencing of expressed genes are still beset by the enormous effort 
required: It is estimated that 30-35 thousand different genes are expressed in a typical 
mammalian tissue in any given state, e.g. Ausubel et al, Editors, Current Protocols, 
5 5.8.1-5.8.4 (John Wiley & Sons, New York, 1992). Determining the sequences of 
even a small sample of that number of gene products is a major enterprise, requiring 
industrial-scale resources. Thus, the routine application of massive sequencing of 
expressed genes is still beyond current commercial technology. 

The availability of new assays for assessing the toxicity of compounds, such 
) as candidate drugs, that would provide more comprehensive and precise information 
about the state of health of a test animal would be highly desirable. Such additional 
assays would preferably be less expensive, more rapid, and more convenient than 
current testing procedures, and would at the same time provide enough information to 
make early judgments regarding the safety of new compounds. 

Summary of the Invention 
An object of the invention is to provide a new approach to toxicity assessment 
based on an examination of gene expression patterns, or profiles, in in vitro or in vivo 
. test systems. 

Another object of the invention is to provide a database on which to base 
decisions concerning the toxicological properties of chemicals, particularly drug 
candidates. 

A further object of the invention is to provide a method for analyzing gene 
expression patterns in selected tissues of test animals. 

A still further object of the invention is to provide a system for identifying 
genes which are differentially expressed in response to exposure to a test compound. 

Another object of the invention is to provide a rapid and reliable method for 
correlating gene expression with short term and long term toxicity in test animals. 

Another object of the invention is to identify genes whose expression is 
predictive of deleterious toxicity. 

The invention achieves these and other objects by providing a method for 
massively parallel signature sequencing of genes expressed in one or more selected 
tissues of an organism exposed to a test compound. An important feature of the 
invention is the application of novel DNA sorting and sequencing methodologies that 
permit the formation of gene expression profiles for selected tissues by determining 
the sequence of portions of many thousands of different polynucleotides in parallel. 
Such profiles may be compared with those from tissues of control organisms at single 
or multiple time points to identify expression patterns predictive of toxicity. 



WO 97/13877 



PCT/US96/16342 



- ^ s°«ing methodology of the inventiort makes use of oligonucleotide lags 
that are members of a minimally cross-hybridizing set of oligonucleotides The 
sequences of oligonucleotides of such a set differ from the sequences of every other 
member of the same set by at least two nucleotides. Thus, each member of such a set 
> cannot form a duplex (or triplex) with the complement of any other member with less 
than two mismatches. Complements of oligonucleotide tags of the invention, referred 
to herem as "tag complements," may comprise natural nucleotides or non-natural 
nucleotide analogs. Preferably, tag complements are attached to solid phase supports 
Such oligonucleotide tags when used with their corresponding tag complements 
prov.de a means of enhancing specificity of hybridization for sorting polynucleotides 
such as cDNAs. 

The polynucleotides to be sorted each have an oligonucleotide tag attached 
such that different polynucleotides have different tags. As explained more fully 
below, this condition is achieved by employing a repertoire of tags substantially- 
greater than the population of polynucleotides and by taking a sufficiently small 
sample of tagged polynucleotides from the full ensemble of tagged polynucleotides 
After such sampling, when the populations of supports and polynucleotides are mixed 
under condmons which permit specific hybridization of the oligonucleotide tags with 
-their respective complements, identical polynucleotides sort onto particular beads or 
regions. The sorted populations of polynucleotides can then be sequenced on the 
solid phase support by a "single-base" or "base-by-base" sequencing methodology as 
described more fully below. 

In one aspect, the method of the invention comprises the following steps (a) 
admimstering the compound to a test organism; (b) extracting a population of mRNA 
molecules from each of one or more tissues of the test organism; (c) forming a 
separate population of cDNA molecules from each population of mRNA molecules 
extracted from the one or more tissues such that each cDNA molecule of the separate 
populates has an oligonucleotide tag attached, the oligonucleotide tags being 
selected from the same minimally cross-hybridizing set; (d) separately sampling each 
population of cDNA molecules such that substantially all different cDNA molecules 
within a separate population have different oligonucleotide tags attached; (e) sorting 
the cDNA molecules of each separate population by specifically hybridizing the 
oligonucleotide tags with their respective complements, the respective complements 
being attached as uniform populations of substantially identical complements in 
spatially discrete regions on one or more solid phase supports; (0 determining the 
nucleotide sequence of a portion of each of the sorted cDNA molecules of each 
separate population to form a frequency distribution of expressed genes for each of 
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" " " the one or mbretfssuesfand (g) correlating the frequency distribution of expressed 
genes in each of the one or more tissues with the toxicity of the compound. 

An important aspect of the invention is the identification of genes whose 
expression is predictive of the toxicity of a compound. Once such genes are 
5 identified, they may be employed in conventional assays, such as reverse transcriptase 
polymerase chain reaction (RT-PCR) assays for gene expression. 

Brief Description of the Drawings 
Figure 1 is a flow chart representation of an algorithm for generating 
1 0 minimally cross-hybridizing sets of oligonucleotides. 

Figure 2 diagrammatically illustrates an apparatus for carrying out 
polynucleotide sequencing in accordance with the invention. 
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Definitions 

1 5 "Complement" or "tag complement" as used herein in reference to 

oligonucleotide tags refers to an oligonucleotide to which a oligonucleotide tag 
specifically hybridizes to form a perfectly matched duplex or triplex. In embodiments 
where specific hybridization results in a triplex, the oligonucleotide tag may be 
.selected to be either double stranded or single stranded. Thus, where triplexes are 
formed, the term "complement" is meant to encompass either a double stranded 
complement of a single stranded oligonucleotide tag or a single stranded complement 
of a double stranded oligonucleotide tag. 

The term "oligonucleotide" as used herein includes linear oligomers of natural 
or modified monomers or linkages, including deoxyribonucleosides, ribonucleosides, 
anomeric forms thereof, peptide nucleic acids (PNAs), and the like, capable of 
specifically binding to a target polynucleotide by way of a regular pattern of 
monomer-to-monomer interactions, such as Watson-Crick type of base pairing, base 
stacking, Hoogsteen or reverse Hoogsteen types of base pairing, or the like. Usually 
monomers are linked by phosphodiester bonds or analogs thereof to form 
oligonucleotides ranging in size from a few monomeric units, e.g. 3-4, to several tens 
of monomeric units. Whenever an oligonucleotide is represented bv a sequence of 
letters, such as "ATGCCTG," it will be understood that the nucleotides are in 5'->3' 
order from left to right and that "A" denotes deoxyadenosine, "C" denotes 
deoxycytidine, "G" denotes deoxyguanosine, and "T" denotes thymidine, unless 
otherwise noted. Analogs of phosphodiester linkages include phosphorothioate, 
phosphorodithioate, phosphoranilidate, phosphoramidate, and the like. Usually 
oligonucleotides of the invention comprise the four natural nucleotides; however, they 
may also comprise non-natural nucleotide analogs. It is clear to those skilled in the 
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- m *ten oligonucleotides having natural or non-natural nucleotides may be 

employed, e.g. where processing by enzymes is called for, usually oligonucleotides 
consisting of natural nucleotides are required. 

"Perfectly matched" in reference to a duplex means that the poly- or 
5 oligonucleotide strands making up the duplex form a double stranded structure with 
one other such that every nucleotide in each strand undergoes Watson-Crick 
basepairing with a nucleotide in the other strand. The term also comprehends the 
painng of nucleoside analogs, such as deoxyinosine, nucleosides with 2-aminopurine 
bases, and the like, that may be employed. In reference to a triplex, the term means 

1 0 that the tnplex consists of a perfectly matched duplex and a third strand in which 
every nucleotide undergoes Hoogsteen or reverse Hoogsteen association with a 
basepair of the perfectly matched duplex. Conversely, a "mismatch" in a duplex 
between a tag and an oligonucleotide means that a pair or triplet of nucleotides in the 
duplex or tnplex fails to undergo Watson-Crick and/or Hoogsteen and/or reverse 

1 5 Hoogsteen bonding. 

As used herein, "nucleoside" includes the natural nucleosides, including 2'- 
deoxy and 2'-hydroxyl forms, e.g. as described in Kornberg and Baker, DNA 
Replication, 2nd Ed. (Freeman, San Francisco, 1992). "Analogs" in reference to 
. nucleosides includes synthetic nucleosides having modified base moieties and/or 
modified sugar moieties, e.g. described by Scheit, Nucleotide Analogs (John Wiley. 
New York, 1980); Uhlman and Peyman. Chemical Reviews, 90: 543-584 (1990) or 
the like, with the only proviso that they are capable of specific hybridization Such 
analogs include synthetic nucleosides designed to enhance binding properties, reduce 
complexity, increase specificity, and the like. 

As used herein "sequence determination" or "determining a nucleotide 
sequence" in reference to polynucleotides includes determination of partial as well as 
full sequence information of the polynucleotide. That is. the term includes sequence 
comparisons, fingerprinting, and like levels of information about a target 
polynucleotide, as well as the express identification and ordering of nucleosides 
usually each nucleoside, in a target polynucleotide. The term also includes the 
determination of the identification, ordering, and locations of one. two, or three of the 
four types of nucleotides within a target polynucleotide. For example, in some 
embodiments sequence determination may be effected by identifying the ordering and 
locations of a single type of nucleotide, e.g. cytosines. within the target polynucleotide 
"CATCGC ..." so that its sequence is represented as a binary code, e.g. "100101 " for 
"C-(not C)-(not C)-C-(not C)-C ... " and the like. 
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As used fierein, the term "complexity" inreference to a population of 
polynucleotides means .the number of different species of molecule present in the 
population. 

As used herein, the terms "gene expression profile," and "gene expression 
pattern" which is used equivalently, means a frequency distribution of sequences of 
portions of cDNA molecules sampled from a population of tag-cDNA conjugates. 
Generally, the portions of sequence are sufficiently long to uniquely identify the 
cDNA from which the portion arose. Preferably, the total number of sequences 
determined is at least 1000; more preferably, the total number of sequences 
determined in a gene expression profile is at least ten thousand. 

As used herein, "test organism" means any in vitro or in vivo system which 
provides measureable responses to exposure to test compounds. Typically, test 
organisms may be mammalian cell cultures, particularly of specific tissues, such as 
hepatocytes, neurons, kidney cells, colony forming cells, or the like, or test organisms 
may be whole animals, such as rats, mice, hamsters, guinea pigs, dogs, cats, rabbits, 
pigs, monkeys, and the like. 



Detailed Descri pti on of the Invention 
The invention provides a method for determining the toxicity of a compound 
by analyzing changes in the gene expression profiles in selected tissues of test 
organisms exposed to the compound. The invention also provides a method of 
identifying toxicity markers consisting of individual genes or a group of genes that is 
expressed acutely and which is correlated with prolonged or chronic toxicity, or 
suggests that the compound will have an undesirable cross reactivity. Gene 
expression profiles are generated by sequencing portions of cDNA molecules 
construction from mRNA extracted from tissues of test organisms exposed to the 
compound being tested. As used herein, the term "tissue" is employed with its usual 
medical or biological meaning, except that in reference to an in vitro test system, such 
as a cell culture, it simply means a sample from the culture. Gene expression profiles 
derived from test organisms are compared to gene expression profiles derived from 
control organisms to determine the genes which are differentially expressed in the test 
organism because of exposure to the compound being tested. In both cases, the 
sequence information of the gene expression profiles is obtained by massively parallel 
signature sequencing of cDNAs, which is implemented in steps (c) through (f) of the 
above method. 

Toxicity Assessment 
Procedures for designing and conducting toxicity tests in in vitro and in vivo 
systems is well known, and is described in many texts on the subject, such as Loomis 
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.. - et al.loomis's£sstentials of Toxicology, 4th Edt (Academic Press. New York 1996) 
Echobichon, The Basics of Toxicity Testing (CRC Press, Boca Raton, 1992)- Frazier ' 
editor. In Vitro Toxicity Testing (Marcel Dekker, New York, 1992); and the like 

In toxicity testing, two groups of test organisms are usually employed one 
group serves as a control and the other group receives the test compound in a single 
dose (for acute toxicity tests) or a regimen of doses (for prolonged or chronic toxicity 
tests). Since m most cases, the extraction of tissue as called for in the method of the 
invention requires sacrificing the test animal, both the control group and the group 
receiving compound must be large enough to permit removal of animals for sampling 
tissues. »f it is desired to observe the dynamics of gene expression through the 
duration of an experiment. 

In setting up a toxicity study, extensive guidance is provided in the literature 
for selecting the appropriate test organism for the compound being tested, route of 
adm.nistrat.on. dose ranges, and the like. Water or physiological saline (0.9% NaCI 
m water) , S the solute of choice for the test compound since these solvents permit 
administration by a variety of routes. When this is not possible because of solubility 
limitations, it is necessary to resort to the use of vegetable oils such as corn oil or 
even organic solvents, of which propylene glycol is commonly used. Whenever 
-possible the use of suspension of emulsion should be avoided except for oral 
administration. Regardless of the route of administration, the volume required to 
administer a given dose is limited by the size of the animal that is used. It is desirable 
to keep the volume of each dose uniform within and between groups of animals 
When rates or mice are used the volume administered by the oral route should not 
exceed 0.005 ml per gram of animal. Even when aqueous or physiological saline 
solutions are used for parenteral injection the volumes that are tolerated are limited 
although such solutions are ordinarily thought of as being innocuous The 
mtravenous LD 50 of distilled water in the mouse is approximately 0.044 ml per gram 
and that of isotonic saline is 0.068 ml per gram of mouse. 

When a compound is to be administered by inhalation, special techniques for 
generating test atmospheres are necessary. Dose estimation becomes very 
complicated. The methods usually involve aerosolization or nebulization of fluids 
containing the compound. If the agent to be tested is a fluid that has an appreciable 
vapor pressure, ,t may be administered by passing air through the solution under 
controlled temperature conditions. Under these conditions, dose is estimated from the 
volume of air inhaled per unit time, the temperature of the solution, and the vapor 
pressure of the agent involved. Gases are metered from reservoirs. When panicles of 
a solution are to be administered, unless the particle size is less than about 2 um the 
panicles will not reach the terminal alveolar sacs in the lungs. A variety of 
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~ apparatuses aiicf chambers are available to perform studies for detecting effects of 
irritant or other toxic endpoints when they are administered by inhalation. The 
preferred method of administering an agent to animals is via the oral route, either by 
intubation or by incorporating the agent in the feed. 

Preferably, in designing a toxicity assessment, two or more species should be 
employed that handle the test compound as similarly to man as possible in terms of 
metabolism, absorption, excretion, tissue storage, and the like. Preferably, multiple 
doses or regimens at different concentrations should be employed to establish a dose- 
response relationship with respect to toxic effects. And preferably, the route of 
administration to the test animal should be the same as, or as similar as possible to, 
the route of administration of the compound to man. Effects obtained by one route of 
administration to test animals are not a priori applicable to effects by another route of 
administration to man. For example, food additives for man should be tested by 
admixture of the material in the diet of the test animals. 

Acute toxicity tests consist of administering a compound to test organisms on 
one occasion. The purpose of such test is to determine the symptomotology 
consequent to administration of the compound and to determine the degree of lethality 
of the compound. The initial procedure is to perform a series of range-finding doses 
-of the compound in a single species. This necessitates selection of a route of 
administration, preparation of the compound in a form suitable for administration by 
the selected route, and selection of an appropriate species. Preferably, initial acute 
toxicity studies are performed on either rats or mice because of their low cost, their 
availability, and the availability of abundant toxicologic reference data on these 
species. Prolonged toxicity tests consist of administering a compound to test 
organisms repeatedly, usually on a daily basis, over a period of 3 to 4 months. Two 
practical factors are encountered that place constraints on the design of such tests: 
First, the available routes of administration are limited because the route selected 
must be suitable for repeated administration without inducing harmful effects. And 
second, blood, urine, and perhaps other samples, should be taken repeatedly without 
inducing significant harm to the test animals. Preferably, in the method of the 
invention the gene expression profiles are obtained in conjunction with the 
measurement of the traditional toxicologic parameters, such as listed in the table 
below: 
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Hematolo p 
erythrocyte count 
total leukocyte count 
differential leukocyte 
count 
hematocrit 
hemoglobin 



Blood Ch emist 
sodium 
potassium 
chloride 

calcium 

carbon dioxide 

serum glutamine-pyruvate 

transaminase 

serum glutamin-oxalacetic 

transaminase 

serum protein 

electrophoresis 

blood sugar 

blood urea nitrogen 

total serum protein 

serum albumin 

total serum bilirubin 



Urine Ana lyses 

PH 

specific gravity 
total protein 

sediment 
glucose 
ketones 

bilirubin 
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ohasesunn™,, * . ■^° SS - Pref " abl >'- ta e "mPl^nls are attached ,o solid 
Phase supports. Such oltgonudeotide tags when used with their corresponding tag 

tn.ck.ng. or labeltng molecules, especially polynucleotides 

Mmtntally cross-hybridizing sets of oligonucleottde tags and tag complements 

sldT„r t 8 ? ,0 Wl " Ch OTS ^ Wd -- » »ugh, to be rruntrntzed ,„r 

Way, „ ^ W ^ SPeCifid,y iS 10 «« «■*■*«* For 
example, a mtntmally cross-hybridizing set may consist of a se, of individua ,y 

synmestzed 10-mer sequences tha, differ from each other bv a, least 4 nucleoL 

such se, avtng amaximum size of 332 (when composed of 3 k ,„ds of nu 2 ^ 

and coun,ed ustng a computer program such as disclosed ,n Appendix 1c) 

Alternately, a minimally cross-hybridizing se, of oligonucleotide tags may also be 
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assembled com&hatorially from subunits whichlthemselves are selected from a 
minimally cross-hybridizing set. For example, a set of minimally cross-hybridizing 
12-mers differing from one another by at least three nucleotides may be synthesized 
by assembling 3 subunits selected from a set of minimally cross-hybridizing 4-mers 
5 that each differ from one another by three nucleotides. Such an embodiment gives a 
maximally sized set of 9 3 , or 729, 12-mers. The number 9 is number of 
oligonucleotides listed by the computer program of Appendix la, which assumes, as 
with the 10-mers, that only 3 of the 4 different types of nucleotides are used. The set 
is described as "maximal" because the computer programs of Appendices Ia-c provide 
0 the largest set for a given input (e.g. length, composition, difference in number of 

nucleotides between members). Additional minimally cross-hybridizing sets may be - 
formed from subsets of such calculated sets. 

Oligonucleotide tags may be single stranded and be designed for specific 
hybridization to single stranded tag complements by duplex formation or for specific 
5 hybridization to double stranded tag complements by triplex formation. 

Oligonucleotide tags may also be double stranded and be designed for specific 
hybridization to single stranded tag complements by triplex formation. 

When synthesized combinatorial^, an oligonucleotide tag preferably consists 
.of a plurality of subunits, each subunit consisting of an oligonucleotide of 3 to 9 
nucleotides in length wherein each subunit is selected from the same minimally cross- 
hybridizing set. In such embodiments, the number of oligonucleotide tags available 
depends on the number of subunits per tag and on the length of the subunits. The 
number is generally much less than the number of all possible sequences the length of 
the tag, which for a tag n nucleotides long would be 4 n . 

Complements of oligonucleotide tags attached to a solid phase support are 
used to sort polynucleotides from a mixture of polynucleotides each containing a tag. 
Complements of the oligonucleotide tags are synthesized on the surface of a solid 
phase support, such as a microscopic bead or a specific location on an array of 
synthesis locations on a single support, such that populations of identical sequences 
are produced in specific regions. That is, the surface of each support, in the case of a 
bead, or of each region, in the case of an array, is derivatized by only one type of 
complement which has a particular sequence. The population of such beads or regions 
contains a repertoire of complements with distinct sequences. As used herein in 
reference to oligonucleotide tags and tag complements, the term "repertoire" means 
the set of minimally cross-hybridizing set of oligonucleotides that make up the tags in 
a particular embodiment or the corresponding set of tag complements. 

The polynucleotides to be sorted each have an oligonucleotide tag attached, 
such that different polynucleotides have different tags. As explained more fully 
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_ below, this condition is-achieved by employing t repertoire of tags substantially 
greater than the population of polynucleotides and by taking a sufficiently small 
sample of tagged polynucleotides from the full ensemble of tagged polynucleotides 
After such sampling, when the populations of supports and polynucleotides are mixed 
under conditions which permit specific hybridization of the oligonucleotide tags with 
their respective complements, identical polynucleotides sort onto particular beads or 
regions. 

The nucleotide sequences of oligonucleotides of a minimally cross-hybridizing 
set are conveniently enumerated by simple computer programs, such as those 
exemplified by programs whose source codes are listed in Appendices la and lb 
Program rmnhx of Appendix la computes all minimally cross-hybridizing sets having 
4-mer subunits composed of three kinds of nucleotides. Program tagN of Appendix 
lb enumerates longer oligonucleotides of a minimally cross-hybridizing set. Similar 
algonthms and computer programs are readily written for listing oligonucleotides of 
minimally cross-hybridizing sets for any embodiment of the invention. Table I below 
prov,des guidance as to the size of sets of minimally cross-hybridizing 
oligonucleotides for the indicated lengths and number of nucleotide differences The 
above computer programs were used to generate the numbers. 

Table I 

Nucleotide 
Difference 

between Maximal Size 

oi ■ Oligonucleotides of Minimally size of 

Ol.gonucleotid of Minimally Cross- p - 

e frn« ' *-ross- Repertoire Sj ze 0 f 

Word Hybrid.** Set "jf "J" Repeno.re with 

Length CI Word s Five Words 



4 



3 



3 

7 

7 5 



8 

8 4 



9 65 6" 5 90x10* 

27 5.3 x10 s 1.43.x I0 7 

4 27 5.3 xlO 5 1.43 xlO 7 

8 4 °*> 3.28 x 1 0 4 



3 190 130 xlO 9 2.48x10" 

62 i 48 v in 7 



10 5 
'0 6 



H 5 



'48x10' 9.16x10 s 

5 18 J.05xl0 5 1.89xl0 6 

5 39 2.31 xlO 6 9.02 xlO 7 
332 1.21 xlO 10 

28 6.15 x 10 s 1.72 xlO 7 



187 



18 



6 =25000 
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18 " * 12 " 24 

For some embodiments of the invention, where extremely large repertoires of 
tags are not required, oligonucleotide tags of a minimally cross-hybridizing set may 
be separately synthesized. Sets containing several hundred to several thousands, or 
even several tens of thousands, of oligonucleotides may be synthesized directly by a 
variety of parallel synthesis approaches, e.g. as disclosed in Frank et al, U.S. patent 
4,689,405; Frank et al, Nucleic Acids Research, 1 1: 4365-4377 (1983); Matson et al, 
Anal. Biochem., 224: 1 10-1 16 (1995); Fodor et al, International application 
PCT/US93/04145; Pease et al, Proc. Natl. Acad. Sci., 91 : 5022-5026 (1994); 
Southern et al, J. Biotechnology, 35: 217-227 (1994), Brennan, International 
application PCT/US94/05896; Lashkari et al, Proc. Natl. Acad. Sci., 92: 7912-7915 
(1995); or the like. 

Preferably, oligonucleotide tags of the invention are synthesized 
combinatorially out of subunits between three and six nucleotides in length and 
selected from the same minimally cross-hybridizing set. For oligonucletides in this 
range, the members of such sets may be enumerated by computer programs based on 
the algorithm of Fig. 1 . 

The algorithm of Fig. 1 is implemented by first defining the characteristics of 
the subunits of the minimally cross-hybridizing set, i.e. length, number of base 
differences between members, and composition, e.g. do they consist of two, three, or 
four kinds of bases. A table M n , n=l, is generated (100) that consists of all possible 
sequences of a given length and composition. An initial subunit S\ is selected and 
compared (120) with successive subunits Sj for i=n+l to the end of the table. 
Whenever a successive subunit has the required number of mismatches to be a 
member of the minimally cross-hybridizing set, it is saved in a new table M n +] (125), 
that also contains subunits previously selected in prior passes through step 120. For 
example, in the first set of comparisons, M2 will contain S ] ; in the second set of 
comparisons, M3 will contain S\ and S2; in the third set of comparisons, M4 will 
contain S 1 , S2, and S3; and so on. Similarly, comparisons in table Mj will be 
between Sj and all successive subunits in Mj. Note that each successive table M n +] 
is smaller than its predecessors as subunits are eliminated in successive passes 
through step 130. After every subunit of table M n has been compared (140) the old 
table is replaced by the new table M n +] , and the next round of comparisons are 
begun. The process stops (160) when a table M n is reached that contains no 
successive subunits to compare to the selected subunit Sj, i.e. M n =M n +i . 

Preferably, minimally cross-hybridizing sets comprise subunits that make 
approximately equivalent contributions to duplex stability as every other subunit in 
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10 th£Stabi,it y of «»**ed duplexes between every subunit 

and ,ts complement is approximately equal. Guidance for selecting such sets is 
pnmded by published techniques for selecting optimal PGR primers and calculating 
duplex stabilities, e.g. Rychlik et al, Nucleic Acids Research, 17: 8543-8551 (1989) 

' w^ 6412 (1 " 0); BreS,aUer Ct Pr ° C - NatL ^ Sci ' «: 3746-3750 

(1986); Wetmur, Crit. Rev. Biochem. Mol. Biol., 26: 227-259 (1991);and the like 

For shorter tags, e.g. about 30 nucleotides or less, the algorithm described by Rychlik 
and Wetmur ,s preferred, and for longer tags, e.g. about 30-35 nucleotides or greater 
an algonthm d,sclosed by Suggs et al, pages 683-693 in Brown, editor ICN-UCLA ' 
Symp. Dev. Biol Vol. 23 (Academic Press, New York, 1981) may be'conveniently 
employed. Clearly, the are many approaches available to one skilled in the art for 
des lg n ing sets of minimally cross-hybridizing subunits within the scope of the 
invention. For example, to minimize the affects of different base-stacking energies of 
termma, nucleofdes when subunits are assembled, subunits may be provided that 
have the same tenninal nucleotides. In this way, when subunits are linked, the sum of 
the base-stackmg energies of all the adjoining terminal nucleotides will be the same 
thereby reducing or eliminating variability in tag melting temperatures 

A "word" of tenninal nucleotides, shown in italic below, may also be added to 
- each end of a tag so that a perfect match is always formed between it and a similar 
termmal < W on any other tag complement. Such an augmented tag would have 
the form: 



w 


w, 


w 2 ... w.. , 


w k 


w 


w 


wr 


w 2 ' ... w...< 




w 



where the primed W's indicate complements. With ends of tags always forming 
perfectly matched duplexes, all mismatched words will be interna, mismatches 
thereby reducing the stability of tag-complement duplexes that otherwise would have 
rmsmatched words at their ends. It is well known that duplexes with internal 
rmsmatches are significantly less stable than duplexes with the same mismatch at a 
terminus. 

A prefened embodiment of minimally cross-hybridizing sets are those whose 
subumts are made up of three of the four natural nucleotides. As will be discussed 
more folly below, the absence of one type of nucleotide in the oligonucleotide tags 
permits target polynucleotides to be loaded onto solid phase supports by use of the 
S->3 exonuclease activity ofaDNA polymerase. The following is an exemplary 
mmtmally cross-hybridizing set of subunits each comprising four nucleotides selected 
rrom the group consisting of A, G, and T: 
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Table II 



Word: 

Sequence: 

Word: 
Sequence : 



Wj w 2 w 3 

GATT TGAT TAGA 



w 4 
TTTG 



w 5 w 6 w 7 Wg 

GTAA ACTA ATGT AAAG 



In this set, each member would form a duplex having three mismatched bases with 
the complement of every other member. 

Further exemplary minimally cross-hybridizing sets are listed below in Table 
III. Clearly, additional sets can be generated by substituting different groups of 
nucleotides, or by using subsets of known minimally cross-hybridizing sets. 



Table III 

Exemplary Minimally Cross- Hvbridizinp Se ts of 4-mer Suhnnitg 



Set 1 
CATT 
CTAA 
TCAT 
ACTA 
TACA 
TTTC 
ATCT 
AAAC 



Set 2 

ACCC 

AGGG 

CACG 

CCGA 

CGAC 

GAGC 

GCAG 

GGCA 

AAAA 



Set 3 

AAAC 

ACCA 

AGGG 

CACG 

CCGC 

CGAA 

GAGA 

GCAG 

GGCC 



Set 4 

AAAG 

ACCA 

AGGC 

CACC 

CCGG 

CGAA 

GAGA 

GCAC 

GGCG 



Set 5 
AACA 
ACAC 
AGGG 
CAAG 
CCGC 
CGCh 
GAGA 
GCCG 
GGAC 



Set 6 

AACG 

ACAA 

AGGC 

CAAC 

CCGG 

CGCA 

GAGA 

GCCC 

GGAG 
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Set 7 




Set 9 


" AAGA * " 




AAGG 


ACAC 




ACAA 


AGCG 




AGCC 


CAAG 




CAAC 


CCCA 


cccc 


CCCG 


CGGC 


CGGA 


CGGA 


GACC 


GACA 


GACA 


GCGG 


GCGG 


GCGC 


GGAA 


GGAC 


GGAG 



Set J.0 


Set 11 


Set 12 


ACAG 


ACCG_. 


ACGA 


AACA 


AAAA 


AAAC 


AGGC 


AGGC 


AGCG 


CAAC 


CACC 


CACA 


CCGA 


CCGA 


CCAG 


CGCG 


CGAG 


CGGC 


GAGG 


GAGG 


GAGG 


GCCC 


GCAC 


GCCC 


GGAA 


. GGCA 


GGAA 



The oligonucleotide tags of the invention and their complements are 
conveniently synthesized on an automated DNA synthesizer, e.g. an Applied 
Biosystems, Inc. (Foster City, California) model 392 or 394 DNA/RNA Synthesizer 
using standard chemistries, such as phosphoramidite chemistry, e.g. disclosed in the 
following references: Beaucage and Iyer. Tetrahedron. 48: 2223-23 1 1 ( 1 992)- Molko 
et al, U.S. patent 4,980,460; Koster et al, U.S. patent 4,725,677; Caruthers et al U S 
patents 4,41 5,732; 4,458,066; and 4,973,679; and the like. Alternative chemistries ' 
e.g. resulting in non-natural backbone groups, such as phosphorothioate 
phosphoramidate, and the like, may also be employed provided that the resulting 
.oligonucleotides are capable of specific hybridization. In some embodiments tags 
may comprise naturally occurring nucleotides that permit processing or manipulation 
by enzymes, while the corresponding tag complements may comprise non-natural 
nucleotide analogs, such as peptide nucleic acids, or like compounds, that promote the 
formation of more stable duplexes during sorting. 

When microparticles are used as supports, repertoires of oligonucleotide tags 
and tag complements may be generated by subunit-wise synthesis via "split and mix- 
techniques, e.g. as disclosed in Shortle et al. International patent application 
PCT/US93/034 1 8 or Lyttle et al, Biotechniques, 1 9: 274-280 ( 1 995). Briefly, the 
basic unit of the synthesis is a subunit of the oligonucleotide tag. Preferably ' 
phosphoramidite chemistry is used and 3' phosphoramidite oligonucleotides are 
prepared for each subunit in a minimally cross-hybridizing set, e.g. for the set first 
listed above, there would be eight 4-mer 3'- P hos P horamidites. Svnthesis proceeds as 
disclosed by Shortle et al or in direct analogy with the techniques employed to 
generate diverse oligonucleotide libraries using nucleosidic monomers, e.g. as 
disclosed in Telenius et al, Genomics, 13: 718-725 (1992); Welsh et al. Nucleic Acids 
Research. 19: 5275-5279 (1991); Grothues et al, Nucleic Acids Research, 21: 1321- 
1322 (1993); Hartley, European patent application 90304496.4; Lam et al, Nature. 
354: 82-84 (1 991 ); Zuckerman et al, Int. J. Pept. Protein Research, 40: 498-507 
(1992); and the like. Generally, these techniques simply call for the application of 
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rhixnires of the" activated monomers to the growing oligonucleotide during the 
coupling steps. Preferably, oligonucleotide tags and tag complements are synthesized 
on a DNA synthesizer having a number of synthesis chambers which is greater than or 
equal to the number of different kinds of words used in the construction of the tags. 
That is, preferably there is a synthesis chamber corresponding to each type of word. 
In this embodiment, words are added nucleotide-by-nucleotide, such that if a word 
consists of five nucleotides there are five monomer couplings in each synthesis 
chamber. After a word is completely synthesized, the synthesis supports are removed 
from the chambers, mixed, and redistributed back to the chambers for the next cycle 
of word addition. This latter embodiment takes advantage of the high coupling yields 
of monomer addition, e.g. in phosphoramidite chemistries. 

Double stranded forms of tags may be made by separately synthesizing the 
complementary strands followed by mixing under conditions that permit duplex 
formation. Alternatively, double stranded tags may be formed by first synthesizing a 
single stranded repertoire linked to a known oligonucleotide sequence that serves as a 
primer binding site. The second strand is then synthesized by combining the single 
stranded repertoire with a primer and extending with a polymerase. This latter 
approach is described in Oliphant et al. Gene, 44: 1 77- 1 83 ( 1 986). Such duplex tags 
- may then be inserted into cloning vectors along with target polynucleotides for sorting 
and manipulation of the target polynucleotide in accordance with the invention. 

When tag complements are employed that are made up of nucleotides that 
have enhanced binding characteristics, such as PNAs or oligonucleotide N3'->P5- 
phosphoramidates, sorting can be implemented through the formation of D-loops 
between tags comprising natural nucleotides and their PNA or phosphoramidate 
complements, as an alternative to the "stripping" reaction employing the 3'-»5" 
exonuclease activity of a DNA polymerase to render a tag single stranded. 

Oligonucleotide tags of the invention may range in length from 12 to 60 
nucleotides or basepairs. Preferably, oligonucleotide tags range in length from 1 8 to 
40 nucleotides or basepairs. More preferably, oligonucleotide tags range in length 
from 25 to 40 nucleotides or basepairs. In terms of preferred and more preferred 
numbers of subunits, these ranges may be expressed as follows: 

Table IV 

Numbers of Subunits in Tags in Preferred Embodiments 
Monomers 

in Subunit Nucleotides in Oligonucleotide Tae 

(12-60) (18-40) (25-40) 
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- - - ... 4 -20 subunits , 6-13 subunits 8-13 subunits 

4 3-15 subunits 4-10 subunits 6-10 subunits 

5 2-12 subunits 3-8 subunits 5-8 subunits 

6 2-10 subunits 3-6 subunits 4-6 subunits 

Most preferably, oligonucleotide tags are single stranded and specific hybridization 
occurs via Watson-Crick pairing with a tag complement. 

Preferably, repertoires of single stranded oligonucleotide tags of the invention 
contain at least 100 members; more preferably, repertoires of such tags contain at 
least 1000 members; and most preferably, repertoires of such tags contain at least 
10,000 members. 

Triplex Tags 

In embodiments where specific hybridization occurs via triplex formation, 
coding of tag sequences follows the same principles as for duplex-forming tags; 
however, there are further constraints on the selection of subunit sequences. 
Generally, third strand association via Hoogsteen type of binding is most stable along 
homopyrimidine-homopurine tracks in a double stranded target. Usually, base triplets 
form in T-A*T or C-G*C motifs (where "-" indicates Watson-Crick pairing and "•" 
indicates Hoogsteen type of binding); however, other motifs are also possible. For 
example. Hoogsteen base pairing permits parallel and antiparallel orientations 
between the third strand (the Hoogsteen strand) and the purine-rich strand of the 
duplex to which the third strand binds, depending on conditions and the composition 
of the strands. There is extensive guidance in the literature for selecting appropriate 
sequences, orientation, conditions, nucleoside type (e.g. whether ribose or 
deoxyribose nucleosides are employed), base modifications (e.g. methylated cytosine. 
and the like) in order to maximize, or otherwise regulate, triplex stability as desired in 
particular embodiments, e.g. Roberts et al, Proc. Natl. Acad. Sci.. 88: 9397-9401 
(1991); Roberts et al, Science, 258: 1463-1466 (1992); Roberts et al, Proc. Natl. 
Acad. Sci., 93: 4320-4325 (1 996); Distefano et al, Proc. Natl. Acad. Sci.. 90: 1 1 79- 
1 183 (1993); Mergny et al. Biochemistry, 30: 9791-9798 (1991); Cheng et al, J. Am. 
Chem. Soc, 1 14: 4465-4474 (1992); Beal and Dervan, Nucleic Acids Research, 20: 
2773-2776 (1992); Beal and Dervan, J. Am. Chem. Soc, 1 14: 4976-4982 (1992); 
Giovannangeli et al, Proc. Natl. Acad. Sci.. 89: 863 1 -8635 ( 1 992); Moser and Dervan. 
Science, 238: 645-650 (1987); McShan et al, J. Biol. Chem.. 267:5712-5721 (1992); 
Yoon et al, Proc. Natl. Acad. Sci., 89: 3840-3844 (1992); Blume et al, Nucleic Acids 
Research, 20: 1 777-1784 (1992); Thuong and Helene, Angew. Chem. Int. Ed. Engl 
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32: 666-690 (1993); Esdude et al, Proc. Natl. Acad. Sci., 93: 4365-4369 (1996); and 
the like. Conditions for annealing single-stranded or duplex tags to their single- 
stranded or duplex complements are well known, e.g. Ji et al, Anal. Chem. 65: 1323- 
1328 (1993); Cantor et al, U.S. patent 5,482,836; and the like. Use of triplex tags has 
the advantage of not requiring a "stripping" reaction with polymerase to expose the 
tag for annealing to its complement. 

Preferably, oligonucleotide tags of the invention employing triplex 
hybridization are double stranded DNA and the corresponding tag complements are 
single stranded. More preferably, 5-methylcytosine is used in place of cytosine in the 
tag complements in order to broaden the range of pH stability of the triplex formed 
between a tag and its complement. Preferred conditions for forming triplexes are 
folly disclosed in the above references. Briefly, hybridization takes place in 
concentrated salt solution, e.g. 1 .0 M NaCl, 1 .0 M potassium acetate, or the like, at 
pH below 5.5 ( or 6.5 if 5-methylcytosine is employed). Hybridization temperature 
depends on the length and composition of the tag; however, for an 18-20-mer tag of 
longer, hybridization at room temperature is adequate. Washes may be conducted 
with less concentrated salt solutions, e.g. 10 mM sodium acetate, 100 mM MgCl 2 , pH 
5.8, at room temperature. Tags may be eluted from their tag complements by 
- incubation in a similar salt solution at pH 9.0. 

Minimally cross-hybridizing sets of oligonucleotide tags that form triplexes 
may be generated by the computer program of Appendix Ic, or similar programs. An 
exemplary set of double stranded 8-mer words are listed below in capital letters with 
the corresponding complements in small letters. Each such word differs from each of 
the other words in the set by three base pairs. 



Table V 

Exemplary Minimally rro SS -Hvbridi?inp 
Set of DoubleStranHed 8-mer Tap s 



5' -AAGGAGAG 
3' -TTCCTCTC 
3' -ttcctctc 

5' -AAAAAAAA 
3' -TTTTTTTT 
3' -ttcttttt 

5' -AAAAAGGG 
3' -TTTTTCCC 
3' -tttttccc 

5' -AAAGGAAG 
3' -TTTCCTTC 
3' -crtccttc 



5 ' -AAAGGGGA 
3' -TTTCCCCT 
3' -tttcccct 

5' -AAGAGAGA 
3' -TTCTCTCT 
3' -ttctctct 

5 ' - AGAAGAGG 
3' -TCTTCTCC 
3' -tcttctcc 

5' -AGAAGGAA 
3' -TCTTCCTT 
3' -tcttcctt 



5' -AGAGAAGA 
3 ' -TCTCTTCT 
3' -tctcttct 

5' -AGGAAAAG 
3' -TCCTTTTC 
3' -tccttttc 

5' -AGGAAGGA 
3' -TCCTTCCT 
3'-tccttcct 

5' -AGGGGAAA 
3' -TCCCCTTT 
3' -tccccttt 



5' -AGGGGGGG 
3' -TCCCCCCC 
3' -tccccccc 

5' -GAAAGGAG 
3' -CTTTCCTC 
3 f -ctttcctc 

5 ' -GAAGAAGG 
3' -CTTCTTCC 
3' -cttcttcr 

5' -GAAGAGAA 
3' -CTTCTCTT 
3' -cttctctt 
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Ohgonuclcotid 
e 

Word 
Length 



4 
6 
8 
10 
15 
20 
20 
20 



Table VI 

Repertoire Size of Various Double Stranded Tags 
That Form Triplexes with Th,\ r Tag Comp lem^ntc 



Nucleotide 
Difference 
between 
Oligonucleotides 
of Minimally 
Cross- 
Hybridizing Set 



2 
3 
3 
5 
5 
6 
8 
10 



Maximal Size 
of Minimally 

Cross- 
Hybridizing 
Set 



8 
8 
16 
8 

92 
765 
92 

->-> 



Size of 
Repertoire 
with Four 

Words 



4096 
4096 
6.5 x 10 4 
4096 



Size of 
Repertoire with 
Five Words 



3.2 x 10 4 
3.2 x I0 4 
1.05 x 10 6 



Preferably, repertoires of double stranded oligonucleotide tags of the invention 
contain at least 10 members; more preferably, repertoires of such tags contain at least 
100 members. Preferably, words are between 4 and 8 nucleotides in length for 
combinatorial^ synthesized double stranded oligonucletide tags, and oligonucleotide 
tags are between 1 2 and 60 base pairs in length. More preferably, such tags are 
between 18 and 40 base pairs in length. 

Solid Phase Sup ports 
Solid phase supports for use with the invention may have a wide variety of 
forms, including microparticles, beads, and membranes, slides, plates, micromachined 
chips, and the like. Likewise, solid phase supports of the invention may comprise a 
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15 



20 



IS 



wide variety of compositions, including glass, plastic, siIicon,-aJkanethiolate- 

derivatized gold, cellulose, low cross-linked and high cross-linked polystyrene silica 

gel. polyam.de. and the like. Preferably, either a population of discrete particles are 

_ employed such that each has a uniform coating, or population, of complementary 

> sequences of the same tag (and no other), or a single or a few supports are emploved 

w,th spatially discrete regions each containing a uniform coating, or population of 

complementary sequences to the same tag (and no other). In the latter embodiment 

the area of the regions may vary according to particular applications; usually the 

reg.ons range in area from several u m 2, e.g. 3-5, to several hundred um2 e g 1 00- 

» 500. Preferably, such regions are spatially discrete so that signals generated by 

events, e.g. fluorescent emissions, at adjacent regions can be resolved by the detection 

system bemg employed. In some applications, it may be desirable to have regions 

w.th uniform coatings of more than one tag complement, e.g. for simultaneous 

sequence analysis, or for bringing separately tagged molecules into close proximity 

Tag complements may be used with the solid phase support that thev are 

synthesized on. or they may be separately synthesized and attached to a solid phase 

support for use, e.g. as disclosed by Lund et al, Nucleic Acids Research 16 10861- 

10880 (1988); Albretsen et al, Anal. Biochem., 189: 40-50 (1990); Wolf et al Nucleic 

- Acids Research, 15: 291 1-2926 (1987); or Ghosh et al, Nucleic Acids Research 15 

5353-5372 (1987). Preferably, tag complements are synthesized on and used with the 

same sol.d phase support, which may comprise a variety of forms and include a 

vanety of linking moieties. Such supports may comprise microparticles or arrays or 

matrices, of regions where uniform populations of tag complements are synthesized 

A wide vanety of microparticle supports may be used with the invention, including 

microparticles made of controlled pore glass (CPG), highly cross-linked polystyrene 

acrylic copolymers, cellulose, nylon, dextran, latex, polyacrolein, and the like ' 

disclosed in the following exempiary references: Meth. Enzyme,., Section A,' P ages 

1 1-147, vol. 44 (Academic Press, New York. 1976); U.S. patents 4.678 814 

4,413,070; and 4,046;720; and Pon. Chapter 19, in Agrawal, editor. Meihods in 

Molecular Biology, Vol. 20, (Humana Press, Totowa, NJ, 1993). Microparticle 

supports further include commercially available nucleoside-derivatized CPG and 

polystyrene beads (e.g. available from Applied Biosystems, Foster Citv CA) 

denvatized magnetic beads; polystyrene grafted with polyethylene glycol (e g 
TentaGelTM, Rapp Polymere Tubjngen ^ ^ ^ ^ 

support characteristics, such as material, porosity, size, shape, and the like and the 
type of hnkmg moiety employed depends on the conditions under which the tags are 
used. For example, in applications involving successive processing with enzymes 
supports and linkers that minimize steric hindrance of the enzymes and that facilitate 
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- aCC6SS 10 substra **re Preferred. Other important-factors to be considered in selecting 
the most appropriate microparticle support include size uniformity; efficiency as a 
synthesis support, degree to which surface area known, and optical properties, e.g. as 
explain more fully below, clear smooth beads provide instrumentation^ advantages 
when handling large numbers of beads on a surface. 

Exemplary linking moieties for attaching and/or synthesizing tags on 
microparticle surfaces are disclosed in Pon et al, Biotechniques, 6:768-775 (1988)- 
Webb, U.S. patent 4,659,774; Barany et al, International patent application 
PCT/US91/06103; Brown et al, J. Chem. Soc. Commun., 1989: 891-893; Damha et 
al. Nucleic Acids Research, 18: 3813-3821 (1990); Beattie et al, Clinical Chemistry. 
39: 719-722 (1993); Maskos and Southern, Nucleic Acids Research 20- 1679-1684 
(1992); and the like. 

As mentioned above, tag complements may also be synthesized on a single 
(or a few) solid phase support to form an array of regions uniformly coated with tag 
complements. That is, within each region in such an array the same tag complement 
is synthesized. Techniques for synthesizing such arrays are disclosed in McGali et al 
International application PCT/US93/03767; Pease et al, Proc. Natl. Acad. Sci., 9 1 : 
5022-5026 (1 994); Southern and Maskos, International application 
-PCT/GB89/01 1 14; Maskos and Southern (cited above); Southern et al, Genomics, 13: 
1008-1017 (1992); and Maskos and Southern, Nucleic Acids Research 21- 4663- 
4669 (1993). 

Preferably, the invention is implemented with microparticles or beads 
uniformly coated with complements of the same tag sequence. Microparticle supports 
and methods of covalently or noncovalently linking oligonucleotides to their surfaces 
are well known, as exemplified by the following references: Beaucage and Iyer (cited 
above); Gait, editor, Oligonucleotide Synthesis: A Practical Approach (IRL Press, 
Oxford, 1984); and the references cited above. Generally, the size and shape of a 
microparticle is not critical; however, microparticles in the size range of a few e g 1 - 
2, to several hundred, e.g. 200-1000 urn diameter are preferable, as they facilitate the 
construction and manipulation of large repertoires of oligonucleotide tags with 
minimal reagent and sample usage. 

In some preferred applications, commercially available controlled-pore glass 
(CPG) or polystyrene supports are employed as solid phase supports in the invention 
Such supports come available with base-labile linkers and initial nucleosides attached 
e.g. Applied Biosystems (Foster City, CA). Preferably, microparticles having pore 
size between 500 and 1000 angstroms are employed. 

In other preferred applications, non-porous microparticles are employed for 
their optical properties, which may be advantageously used when tracking large 



22- 



WO 97/13877 



PCT/US96/16342 



numbers of microparticles on planar supports, such as a microscope slide. 
Particularly preferred non-porous microparticles are the glycidal methacrylate (GMA) 
beads available from Bangs Laboratories (Carmel, IN). Such microparticles are 
useful in a variety of sizes and derivatized with a variety of linkage groups for 
5 synthesizing tags or tag complements. Preferably, for massively parallel 

manipulations of tagged microparticles, 5 ^im diameter GMA beads are employed. 



10 

Attaching Taps to Polynucleotides 
For Sorting onto Solid Phase Supports 
An important aspect of the invention is the sorting and attachment of a 
populations of polynucleotides, e.g. from a cDNA library, to microparticles or to 

1 5 separate regions on a solid phase support such that each microparticle or region has 
substantially only one kind of polynucleotide attached. This objective is 
accomplished by insuring that substantially all different polynucleotides have 
different tags attached. This condition, in turn, is brought about by taking a sample of 
- the ftill ensemble of tag-polynucleotide conjugates for analysis. (It is acceptable that 

20 identical polynucleotides have different tags, as it merely results in the same 

polynucleotide being operated on or analyzed twice in two different locations.) Such 
sampling can be carried out either overtly-for example, by taking a small volume 
from a larger mixture-after the tags have been attached to the polynucleotides, it can 
be carried out inherently as a secondary effect of the techniques used to process the 

25 polynucleotides and tags, or sampling can be carried out both overtly and as an 
inherent part of processing steps. 

Preferably, in constructing a cDNA library where substantially all different 
cDNAs have different tags, a tag repertoire is employed whose complexity, or number 
of distinct tags, greatly exceeds the total number of mRNAs extracted from a cell or 

30 tissue sample. Preferably, the complexity of the tag repertoire is at least 10 times that 
of the polynucleotide population; and more preferably, the complexity of the tag 
repertoire is at least 100 times that of the polynucleotide population. Below, a 
protocol is disclosed for cDNA library construction using a primer mixture that 
contains a full repertoire of exemplary 9-word tags. Such a mixture of tag-containing 

35 primers has a complexity of 8 9 , or about 1 .34 x 1 0 8 . As indicated by Winslow et al, 
Nucleic Acids Research, 19: 3251-3253 (1991), mRNA for library construction can 
be extracted from as few as 10-100 mammalian cells. Since a single mammalian cell 
contains about 5 x 10 5 copies of mRNA molecules of about 3.4 x 10 4 different kinds. 



-23- 



WO 97/13877 



PCT/US96/16342 



15 



20 



25 



30 



35 



ty-standard techniques- one can isolate the mRNA from about 100 cells, or 

(theoretically) about 5 x I0 7 mRNA molecules. Comparing thiTnumber to the 
complexity of the primer mixture shows that without any additional steps, and even 
assuming that mRNAs are converted into cDNAs with perfect efficiency (1% 
> efficiency or less is more accurate), the cDNA library construction protocol results in 
a population containing no more than 37% of the total number of different tags That 
«s, without any overt sampling step at all, the protocol inherently generates a sample 
that comprises 37%, or less, of the tag repertoire. The probability of obtaining a 
double under these conditions is about 5%, which is within the preferred range With 
' mRNA from 10 cells, the fraction of the tag repertoire sampled is reduced to only 
3.7%, even assuming that all the processing steps take place at 100% efficiency In 
fact, the efficiencies of the processing steps for constructing cDNA libraries are very 
low, a "rule of thumb" being that good library should contain about 10 8 cDNA clones 
from mRNA extracted from 10 6 mammalian cells. 

Use of larger amounts of mRNA in the above protocol, or for larger amounts 
of polynucleotides in general, where the number of such molecules exceeds the 
complexity of the tag repertoire, a tag-polynucleotide conjugate mixture potentially 
contains every possible pairing of tags and types of mRNA or polynucleotide. In such 
- cases, overt sampling may be implemented by removing a sample volume after a 
senal dilution of the starting mixture of tag-polynucleotide conjugates. The amount 
of dilution required depends on the amount of starting material and the efficiencies of 
the processing steps, which are readily estimated. 

If mRNA were extracted from 10 6 cells (which would correspond to about 0 5 
ug of poly(Ar RNA), and if primers were present in about 10-100 fold concentration 
excess-as is called for in a typical protocol, e.g. Sambrook et al, Molecular Cloning 
Second Edition, page 8.61 [lOuL 1.8 kb mRNA at 1 mg/mL equals about 1.68x lO"'" 
moles and 1 0 uL I 8-mer primer at 1 m g /m L equals about 1.68x1 0* moles], then the 
total number of tag-polynucleotide conjugates in a cDNA library would simply be 
equal to or less than the starting number of mRNAs, or about 5 x 10* 1 vectors 
containing tag-polynucleotide conjugates-again this assumes that each step in cDNA 
construction-first strand synthesis, second strand synthesis, ligation into a vector- 
occurs with perfect efficiency, which is a very conservative estimate. The actual 
number is significantly less. 

If a sample of n tag-polynucleotide conjugates are randomly drawn from a 
reaction mixture-as could be effected by taking a sample volume, the probability of 
drawing conjugates having the same tag is described by the Poisson distribution, 
P(r)=e- (X)7r. where r is the number of conjugates having the same tag and k=np 
where p is the probability of a given tag being selected. If n=10 6 and p=l/(l .34 x 
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lOVtheri X= 00746 and P(2)=2.76 x 10* 5 . Thus, a sample of one million molecules 
gives rise to an expected number of doubles well within the preferred range. Such a 
sample is readily obtained as follows: Assume that the 5 x 10 n mRNAs are perfectly 
converted into 5 x 10 n vectors with tag-cDNA conjugates as inserts and that the 5 x 
5 1 0 1 1 vectors are in a reaction solution having a volume of 100 jaL Four 1 0-fold serial 
dilutions may be carried out by transferring 10 ^1 from the original solution into a 
vessel containing 90 \il of an appropriate buffer, such as TE. This process may be 
repeated for three additional dilutions to obtain a 100 ^1 solution containing 5 x 10 5 
vector molecules per \il A 2 \i\ aliquot from this solution yields 1 0 6 vectors 
1 0 containing tag-cDNA conjugates as inserts. This sample is then amplified by straight 
forward transformation of a competent host cell followed by culturing. 

Of course, as mentioned above, no step in the above process proceeds with 
perfect efficiency. In particular, when vectors are employed to amplify a sample of 
tag-polynucleotide conjugates, the step of transforming a host is very inefficient. 
1 5 Usually, no more than 1 % of the vectors are taken up by the host and replicated. 
Thus, for such a method of amplification, even fewer dilutions would be required to 
obtain a sample of 10 6 conjugates. 

A repertoire of oligonucleotide tags can be conjugated to a population of 
- polynucleotides in a number of ways, including direct enzymatic ligation, 
20 amplification, e.g. via PCR, using primers containing the tag sequences, and the like. 
The initial ligating step produces a very large population of tag-polynucleotide 
conjugates such that a single tag is generally attached to many different 
polynucleotides. However, as noted above, by taking a sufficiently small sample of 
the conjugates, the probability of obtaining "doubles," i.e. the same tag on two 
!5 different polynucleotides, can be made negligible. Generally, the larger the sample 
the greater the probability of obtaining a double. Thus, a design trade-off exists 
between selecting a large sample of tag-polynucleotide conjugates- which, for 
example, ensures adequate coverage of a target polynucleotide in a shotgun 
sequencing operation or adequate representation of a rapidly changing mRNA pool, 
0 and selecting a small sample which ensures that a minimal number of doubles will be 
present. In most embodiments, the presence of doubles merely adds an additional 
source of noise or, in the case of sequencing, a minor complication in scanning and 
signal processing, as microparticles giving multiple fluorescent signals can simply be 
ignored. 

5 As used herein, the term "substantially all" in reference to attaching tags to 

molecules, especially polynucleotides, is meant to reflect the statistical nature of the 
sampling procedure employed to obtain a population of tag-molecule conjugates 
essentially free of doubles. The meaning of substantially all in terms of actual 
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. percentages of-tag-molecule conjugates dependsron how the tags are being employed. 
Preferably, for nucleic acid sequencing, substantially all meansThat at least eighty 
percent of the polynucleotides have unique tags attached. More preferably, it means 
that at least ninety percent of the polynucleotides have unique tags attached. Still 
more preferably, it means that at least ninety-five percent of the polynucleotides have 
unique tags attached. And, most preferably, it means that at least ninety-nine percent 
of the polynucleotides have unique tags attached. 

Preferably, when the population of polynucleotides consists of messenger 
RNA (mRNA), oligonucleotides tags may be attached by reverse transcribing the 
mRNA with a set of primers preferably containing complements of tag sequences. 
An exemplary set of such primers could have the following sequence (SEQ ID NO: 

5 ' -mRNA- [A] n -3' 

[T] 1 9 GG [ W, W, W, C] 9AC CAGCTG ATC-5 ' -biotin 



20 



30 



35 



where "[ W,W.W,C] 9 " represents the sequence of an oligonucleotide tag of nine 
- subunits of four nucleotides each and "[W,W,W,C]" represents the subunit sequences 
listed above, i.e. » W" represents T or A. The underlined sequences identify an 
optional restriction endonuclease site that can be used to release the polynucleotide 
from attachment to a solid phase support via the biotin, if one is employed. For the 
above primer, the complement attached to a microparticle could have the form: 

5'-[G,W,W,W] gTGG-linker-microparticle 

After reverse transcription, the mRNA is removed, e.g. by RNase H digestion 
and the second strand of the cDNA is synthesized using, for example, a primer of the 
following form (SEQ ID NO: 2): 

5 ' -NRRGATCYNNN-3 ' 

where N is any one of A, T, G, or C; R is a purine-containing nucleotide, and Y is a 
pyrimidine-containing nucleotide. This particular primer creates a Bst Yl restriction 
site in the resulting double stranded DNA which, together with the Sal I site, 
facilitates cloning into a vector with, for example, Bam HI and Xho I sites. After Bst 
Yl and Sal I digestion, the exemplary conjugate would have the form: 
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.... 5 r -RCGKCCArc,W,W f W]9GG[T]f9- cDNA -NNNR 

GGT[G,W,W,W] 9 CC[A] 19 - rDNA -NNNYCTAG-5 ' 

The polynucleotide-tag conjugates may then be manipulated using standard molecular 
5 biology techniques. For example, the above conjugate-- which is actually a mixture- 
may be inserted into commercially available cloning vectors, e.g. Stratagene Cloning 
System (La Jolla, CA); transfected into a host, such as a commercially available host 
bacteria; which is then cultured to increase the number of conjugates. The cloning 
vectors may then be isolated using standard techniques, e.g. Sambrook et al, 

1 0 Molecular Cloning, Second Edition (Cold Spring Harbor Laboratory, New York, 
1989). Alternatively, appropriate adaptors and primers may be employed so that the 
conjugate population can be increased by PCR. 

Preferably, when the Iigase-based method of sequencing is employed, the Bst 
Yl and Sal I digested fragments are cloned into a Bam HI-/Xho I-digested vector 

1 5 having the following single-copy restriction sites (SEQ ID NO: 3): 



5 ' -GAGGATGCCTTTATGGATCCA CTCGAG ATCCCAATCCA-3 1 
Fokl BamHI Xhol 

20 

This adds the Fok I site which will allow initiation of the sequencing process 
discussed more fully below. 

Tags can be conjugated to cDNAs of existing libraries by standard cloning 
methods. cDNAs are excised from their existing vector, isolated, and then ligated into 

25 a vector containing a repertoire of tags. Preferably, the tag-containing vector is 

linearized by cleaving with two restriction enzymes so that the excised cDNAs can be 
ligated in a predetermined orientation. The concentration of the linearized tag- 
containing vector is in substantial excess over that of the cDNA inserts so that 
ligation provides an inherent sampling of tags. 

30 A general method for exposing the single stranded tag after amplification 

involves digesting a target polynucleotide-containing conjugate with the 5'->3 r 
exonuclease activity of T4 DNA polymerase, or a like enzyme. When used in the 
presence of a single deoxynucleoside triphosphate, such a polymerase will cleave 
nucleotides from 3' recessed ends present on the non-template strand of a double 

35 stranded fragment until a complement of the single deoxynucleoside triphosphate is 
reached on the template strand. When such a nucleotide is reached the 5'->3' 
digestion effectively ceases, as the polymerase's extension activity adds nucleotides at 
a higher rate than the excision activity removes nucleotides. Consequently, single 
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stranded lags-constructed with three nucleotides-are readily prepared for loading onto 

solid phase supports. . - ■ 

The technique may also be used to preferentially methylate interior Fok I sites 
of a target polynucleotide while leaving a single Fok I site at the terminus of the 
5 polynucleotide unmethylated. First, the terminal Fok I site is rendered single stranded 
using a polymerase with deoxycytidine triphosphate. The double stranded portion of 
the fragment is then methylated, after which the single stranded terminus is filled in 
with a DNA polymerase in the presence of all four nucleoside triphosphates, thereby 
regenerating the Fok I site. Clearly, this procedure can be generalized to 
0 endonucleases other than Fok I. 

After the oligonucleotide tags are prepared for specific hybridization, e.g. by 
rendering them single stranded as described above, the polynucleotides are mixed 
with microparticles containing the complementary sequences of the tags under 
conditions that favor the formation of perfectly matched duplexes between the tags 
5 and their complements. There is extensive guidance in the literature for creating these 
conditions. Exemplary references providing such guidance include Wetmur, Critical 
Reviews in Biochemistry and Molecular Biology, 26: 227-259 (1991); Sambrook et 
al, Molecular Cloning: A Laboratory Manual, 2nd Edition (Cold Spring Harbor 
- Laboratory, New York, 1989); and the like. Preferably, the hybridization conditions 
are sufficiently stringent so that only perfectly matched sequences form stable 
duplexes. Under such conditions the polynucleotides specifically hybridized through 
their tags may be ligated to the complementary sequences attached to the 
microparticles. Finally, the microparticles are washed to remove polynucleotides with 
unligated and/or mismatched tags. 

When CPG microparticles conventionally employed as synthesis supports are 
used, the density of tag complements on the microparticle surface is typically greater 
than that necessary for some sequencing operations. That is, in sequencing 
approaches that require successive treatment of the attached polynucleotides with a 
variety of enzymes, densely spaced polynucleotides may tend to inhibit access of the 
relatively bulky enzymes to the polynucleotides. In such cases, the polynucleotides 
are preferably mixed with the microparticles so that tag complements are present in 
significant excess, e.g. from 10:1 to 100:1. or greater, over the polynucleotides. This 
ensures that the density of polynucleotides on the microparticle surface will not be so 
high as to inhibit enzyme access. Preferably, the average inter-polynucleotide spacing 
on the microparticle surface is on the order of 30-100 nm. Guidance in selecting 
ratios for standard CPG supports and Ballotini beads (a type of solid glass support) is 
found in Maskos and Southern, Nucleic Acids Research, 20: 1679-1684 (1992). 
Preferably, for sequencing applications, standard CPG beads of diameter in the range 
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of 2(1-50 ixm are loaded" with about 10 5 polynucleotides, and GMA beads of diameter 
in the range of 5-1 0 jim are loaded with a few tens of thousand of polynucleotides, 
e.g.4x 104 to6x 10* 

In the preferred embodiment, tag complements are synthesized on 
microparticles combinatorially; thus, at the end of the synthesis, one obtains a 
complex mixture of microparticles from which a sample is taken for loading tagged 
polynucleotides. The size of the sample of microparticles will depend on several 
factors, including the size of the repertoire of tag complements, the nature of the 
apparatus for used for observing loaded microparticles-e.g. its capacity, the tolerance 
for multiple copies of microparticles with the same tag complement (i.e. "bead 
doubles"), and the like. The following table provide guidance regarding 
microparticle sample size, microparticle diameter, and the approximate physical 
dimensions of a packed array of microparticles of various diameters. 



Microparticle diameter 5 10 \m\ 20 nm 40 

Max. no. 

polynucleotides loaded 

at]perI0 5 sq. 3 x I0 5 1.26 xlO 6 5xl0 6 

angstrom 

Approx. area of 
monolayer of !0 6 

microparticles .45 x .45 cm 1 x I cm 2 x 2 cm 4 x 4 cm 



The probability that the sample of microparticles contains a given tag complement 
is present in multiple copies is described by the Poisson distribution, as indicated ii 
the following table. 



Table VII 
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Number of 
micr panicles in 
sample (as fraction 
of repertoire size), 
m 



1.000 
.693 
.405 
.285 
.223 
.105 
.010 



Fraction of 
repertoire of tag 
complements 
present in 
sample, 



0.63 
0.50 
0.33 
0.25 
0.20 
0.10 
0.01 



Fraction f 
microparticles in 
sample with unique 
tag complement 
attached, 
m(e' m )/2 



0.37 
0.35 
0.27 
0.21 
0.18 
0.09 
0.01 



Fraction of 
microparticles in 
sample carrying 
same tag 
complement as one 
other microparticle 

in sample 
("bead doubles"), 
m 2 (e' m )/2 



0.18 
0.12 
0.05 
0.03 
0.02 
0.005 



High Specificity Sorting and Panning 

The kinetics of sorting depends on the rate of hybridization of oligonucleotide 
tags to their tag complements which, in turn, depends on the complexity of the tags in 
- the hybridization reaction. Thus, a trade off exists between sorting rate and tag 
complexity, such that an increase in sorting rate may be achieved at the cost of 
reducing the complexity of the tags involved in the hybridization reaction. As 
explained below, the effects of this trade off may be ameliorated by "panning.' 1 

Specificity of the hybridizations may be increased by taking a sufficiently 
small sample so that both a high percentage of tags in the sample are unique and the 
nearest neighbors of substantially all the tags in a sample differ by at least two words. 
This latter condition may be met by taking a sample that contains a number of tag- 
polynucleotide conjugates that is about 0.1 percent or less of the size of the repertoire 
being employed. For example, if tags are constructed with eight words selected from 
Table II, a repertoire of 8«, or about 1 .67 x 1 0\ tags and tag complements are 
produced. In a library of tag-cDNA conjugates as described above, a 0. 1 percent 
sample means that about 16,700 different tags are present. If this were loaded directly 
onto a repertoire-equivalent of microparticles, or in this example a sample of 1 .67 x 
107 microparticles, then only a sparse subset of the sampled microparticles would be 
loaded. The density of loaded microparticles can be increase-for example, for more 
efficient sequencing--by undertaking a "panning" step in which the sampled tag- 
cDNA conjugates are used to separate loaded microparticles from unloaded 
microparticles. Thus, in the example above, even though a "0.1 percent" sample 
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coniaTns only-lKTOO cDNAs, the sampling and panning steps may be repeated until 
as many loaded microparticles as desired are accumulated. 

A panning step may be implemented by providing a sample of tag-cDNA 
conjugates each of which contains a capture moiety at an end opposite, or distal to 
the oligonucleotide tag. Preferably, the capture moiety is of a type which can be ' 
released from the tag-cDNA conjugates, so that the tag-cDNA conjugates can be 
sequenced with a single-base sequencing method. Such moieties may comprise 
biotm, digoxigenin, or like ligands, a triplex binding region, or the like. Preferably 
such a capture moiety comprises a biotin component. Biotin may be attached to tag- 
cDNA conjugates by a number of standard techniques. If appropriate adapters 
contaming PCR primer binding sites are attached to tag-cDNA conjugates, biotin mav 
be attached by using a biotinylated primer in an amplification after sampling 
Alternatively, if the tag-cDNA conjugates are inserts of cloning vectors, biotin may be 
attached after excising the tag-cDNA conjugates by digestion with an appropriate 
restricts enzyme followed by isolation and filling in a protruding strand distal to the 
tags w«th a DNA polymerase in the presence of biotinylated uridine triphosphate. 

After a tag-cDNA conjugate is captured, it may be released from the biotin 
moiety in a number of ways, such as by a chemical linkage that is cleaved by 
-reduction, e.g. Herman et al, Anal. Biochem., 156: 48-55 (1986), or that is cleaved 
photochemically, e.g. Olejnik et al, Nucleic Acids Research, 24: 36 1 -366 ( 1 996) or 
that ,s cleaved enzymatically by introducing a restriction site in the PCR primer ' The 
latter embodiment can be exemplified by considering the library of tag-polynucleotide 
conjugates described above: 

S'-RCGACCAICW^WJgGGlThg- cDNA -NNNR 

GGT[G,W,W,W]9CC[A] 19 - rDNA -NNNYCTAG-5 ' 

The following adapters may be ligated to the ends of these fragments to permit 
amplification by PCR: 



5 • - xxxxxxxxxxxxxxxxxxxx 

XXXXXXXXXXXXXXXXXXXXYGAT 

Right Adapter 



GATC2ZACTAGTZZZ2ZZZ22ZZZ-3 • 
ZZTGATCAZZZZZZZZZZZZ 
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Left Adapter 
ZZTGATCAZZZZZZZZZZZZ-5 ■ -biocin 
Left Primer 

where "ACTAGT" is a Spe I recognition site (which leaves a staggered cleavage 
ready for single base sequencing), and the X's and Zs are nucleotides selected so that 
the annealing and dissociation temperatures of the respective primers are 
approximately the same. After ligation of the adapters and amplification by PCR 
usmg the biotinylated primer, the tags of the conjugates are rendered single stranded 
by the exonuclease activity of T4 DNA polymerase and conjugates are combined with 
a sample of m.croparticles, e.g. a repertoire equivalent, with tag complements 
attached. After annealing under stringent conditions (to minimize mis-attachment of 
tags), the conjugates are preferably ligated to their tag complements and the loaded 
rmcroparticles are separated from the unloaded microparticles by capture with 
avidinated magnetic beads, or like capture technique. 

Returning to the example, this process results in the accumulation of about 
10.500 (-16.700 x .63) loaded microparticles with different tags, which may be 
released from the magnetic beads by cleavage with Spe I. By repeating this process 
40-50 times with new samples of microparticles and tag-cDNA conjugates 4-5 x 105 
cDNAs can be accumulated by pooling the released microparticles. The pooled 
rmcroparticles may then be simultaneously sequenced by a single-base sequencing 
25 technique. 

Determining how many times to repeat the sampling and panning steps-or 
more generally, determining how many cDNAs to analyze, depends on one's 
objective. If the objective is to monitor the changes in abundance of relatively 
common sequences, e.g. making up 5% or more of a population, then relatively small 
samples, ..e. a small fraction of the total population size, may allow statistically 
significant estimates of relative abundances. On the other hand, if one seeks to 
monitor the abundances of rare sequences, e.g. making up 0. 1 % or less of a 
population, then large samples are required. Generally, there is a direct relationship 
between sample size and the reliability of the estimates of relative abundances based 
on the sample. There is extensive guidance in the literature on determining 
appropriate sample sizes for making reliable statistical estimates, e.g Koller et al 
Nucleic Acids Research, 23:185-191 (1994); Good, Biometrika, 40: 16-264 (1953) 
Bunge et al, J. Am. Stat. Assoc., 88: 364-373 (1993); and the like. Preferably for ' 
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monitoring changes in gene expression based onthe analysis of a series of cDNA 
libraries containing 10 5 -to 10 8 independent clones of 3.0-3.5 x 10 4 different 
sequences, a sample of at least 10 4 sequences are accumulated for analysis of each 
library. More preferably, a sample of at least 10 5 sequences are accumulated for the 
analysis of each library; and most preferably, a sample of at least 5 x 10 5 sequences 
are accumulated for the analysis of each library. Alternatively, the number of 
sequences sampled is preferably sufficient to estimate the relative abundance of a 
sequence present at a frequency within the range of 0.1% to 5% with a 95% 
confidence limit no larger than 0.1% of the population size. 

Single Base DNA Sequencing 
The present invention can be employed with conventional methods of DNA 
sequencing, e.g. as disclosed by Hultman et al, Nucleic Acids Research, 17: 4937- 
4946 ( 1 989). However, for parallel, or simultaneous, sequencing of multiple 
polynucleotides, a DNA sequencing methodology is preferred that requires neither 
electrophoretic separation of closely sized DNA fragments nor analysis of cleaved 
nucleotides by a separate analytical procedure, as in peptide sequencing. Preferably, 
the methodology permits the stepwise identification of nucleotides, usually one at a 
-time, in a sequence through successive cycles of treatment and detection. Such 
methodologies are referred to herein as "single base" sequencing methods. Single 
base approaches are disclosed in the following references: Cheeseman, U.S. patent 
5,302,509; Tsien et al, International application WO 91/06678; Rosenthal et al, 
International application WO 93/21340; Canard et al. Gene, 148: 1-6 (1994); and 
Metzker et al, Nucleic Acids Research, 22: 4259-4267 (1994). 

A "single base" method of DNA sequencing which is suitable for use with the 
present invention and which requires no electrophoretic separation of DNA fragments 
is described in International application PCTYUS95/03678. Briefly, the method 
comprises the following steps: (a) ligating a probe to an end of the polynucleotide 
having a protruding strand to form a ligated complex, the probe having a 
complementary protruding strand to that of the polynucleotide and the probe having a 
nuclease recognition site; (b) removing unligated probe from the ligated complex; (c) 
identifying one or more nucleotides in the protruding strand of the polynucleotide by 
the identity of the ligated probe; (d) cleaving the ligated complex with a nuclease; and 
(e) repeating steps (a) through (d) until the nucleotide sequence of the polynucleotide, 
or a portion thereof, is determined. 

A single signal generating moiety, such as a single fluorescent dye, may be 
employed when sequencing several different target polynucleotides attached to 
different spatially addressable solid phase supports, such as fixed microparticles, in a 
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parallel sequencing operation. This may be accomplished by providing four sets of 
probes that are applied sequentially to the plurality of target polynucleotides on the 
different microparticles. An exemplary set of such probes are shown below: 



Set I 



Set 2 



Set 3 



Set 4 



ANNNN. . .NN 

N. . . NNTT . . .T* 

dCNNNN . . . NN 

N . . . NNTT . . . T 

dGNNNN . . . NN 

N . . . NNTT . . . T 

dTNNNN . . . NN 

N. . .NNTT. . .T 



dANNNN . . . NN dANNNN . . . NN dANNNN . . . NN 

d N...NNTT...T N...NNTT...T N...NNTT...T 

CNNNN . . . NN dCNNNN . . . NN dCNNNN . . . NN 

N...NNTT...T. N...NNTT...T N...NNTT...T 

dGNNNN... NN GNNNN . . . NN dGNNNN... NN 

N...NNTT...T N...NNTT...T- N...NNTT...T 

dTNNNN... NN dTNNNN. . .NN TNNNN. . .NN 

N...NNTT...T N...NNTT...T N. . .NNTT. . .T* 



where each of the listed probes represents a mixture of 43=64 oligonucleotides such 
that the identity of the 3' terminal nucleotide of the top strand is fixed and the other 
pontons in the protruding strand are filled by every 3-mer permutation of nucleotides 
. or complex.ty reducing analogs. The listed probes are also shown with a single 
stranded P oly-T tail with a signal generating moiety attached to the terminal thymidine 
shown as «T~. The M" on the unlabeled probes designates a ligation-b.ocking moiety' 
or absense of S'-hydroxyl, which prevents unlabeled probes from being ligated 
Preferably, such 3'-terminal nucleotides are dideoxynucleotides. In this embodiment 
the probes of set lare first applied to the plurality of target polynucleotides and treated 
with a ligase so that target polynucleotides having a thymidine complementary to the 3' 
terminal adenosine of the labeled probes are ligated. The unlabeled probes are 
simultaneously applied to minimize inappropriate ligations. The locations of the target 
polynucleotides that form ligated complexes with probes terminating in "A" are 
.dentified by the signal generated by the label carried on the probe. After washing and 
cleavage, the probes of set 2 are applied. In this case, target polynucleotides forming 
ligated complexes with probes terminating in "C" are identified by location. Similarly 
the probes of sets 3 and 4 are applied and locations of positive signals identified Thus 
process of sequentially applying the four sets of probes continues until the desired 
number of nucleotides are identified on the target polynucleotides. Clearly one of 
ordmary skill could construct similar sets of probes that could have many variations 
such as having protruding strands of different lengths, different moieties to block 
ligation of unlabeled probes, different means for labeling probes, and the like 
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" ~ ■ " " A pparatusTor Sequencing Populations of Polynucleotides 
An objective of the invention is to sort identical molecules, particularly 
polynucleotides, onto the surfaces of microparticles by the specific hybridization of 
tags and their complements. Once such sorting has taken place, the presence of the 
molecules or operations performed on them can be detected in a number of ways 
depending on the nature of the tagged molecule, whether microparticles are detected 
separately or in "batches," whether repeated measurements are desired, and the like. 
Typically, the sorted molecules are exposed to ligands for binding, e.g. in drug 
development, or are subjected chemical or enzymatic processes, e.g. in polynucleotide 
sequencing. In both of these uses it is often desirable to simultaneously observe 
signals corresponding to such events or processes on large numbers of microparticles. 
Microparticles carrying sorted molecules (referred to herein as "loaded" 
microparticles) lend themselves to such large scale parallel operations, e.g. as 
demonstrated by Lam et al (cited above). 

Preferably, whenever light-generating signals, e.g. chemiluminescent, 
fluorescent, or the like, are employed to detect events or processes, loaded 
microparticles are spread on a planar substrate, e.g. a glass slide, for examination with 
a scanning system, such as described in International patent applications 
. PCT/US9 1/092 17, PCT/NL90/0008 1 , and PCT/US95/01886. The scanning system 
should be able to reproducibly scan the substrate and to define the positions of each 
microparticle in a predetermined region by way of a coordinate system. In 
polynucleotide sequencing applications, it is important that the positional 
identification of microparticles be repeatable in successive scan steps. 

Such scanning systems may be constructed from commercially available 
components, e.g. x-y translation table controlled by a digital computer used with a 
detection system comprising one or more photomultiplier tubes, or alternatively, a 
CCD array, and appropriate optics, e.g. for exciting, collecting, and sorting 
fluorescent signals. In some embodiments a confocal optical system may be 
desirable. An exemplary scanning system suitable for use in four-color sequencing is 
illustrated diagrammatically in Figure 5. Substrate 300, e.g. a microscope slide with 
fixed microparticles, is placed on x-y translation table 302, which is connected to and 
controlled by an appropriately programmed digital computer 304 which may be any of 
a variety of commercially available personal computers, e.g. 486-based machines or 
PowerPC model 7100 or 8100 available form Apple Computer (Cupertino, CA). 
Computer software for table translation and data collection functions can be provided 
by commercially available laboratory software, such as Lab Windows, available from 
National Instruments. 
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Substra *- 3 <>° and table 302 are operationally associatedwith microscope 306 
having one ormore objective lenses 308 which are capable of collecting and 
del, vering light to microparticles fixed to substrate 300. Excitation beam 3 1 0 from 

hght source 312, which is preferably a laser, is directed to beam splitter 314 eg a 
> dichroic mirror, which re-directs the beam through microscope 306 and objective lens 
308 which, in turn, focuses the beam onto substrate 300. Lens 308 collects 
fluorescence 3 1 6 emitted from the microparticles and directs it through beam splitter 
3 1 4 to signal distribution optics 3 1 8 which, in turn, directs fluorescence to one or 
more suitable opto-electronic devices for converting some fluorescence characteristic 
e* intensity, lifetime, or the like, to an electrical signal. Signal distribution optics " 
31 8 may compnse a variety of components standard in the art, such as bandpass 
filters, fiber optics, rotating mirrors, fixed position mirrors and lenses, diffraction 
gratings, and the like. As illustrated in Figure 2. signal distribution optics 3 1 8 directs 
fluorescence316tofourseparatephotomultipliertubes.330,332 334 and 336 
whose output is then directed to pre-amps and photon counters 350, 352 354 ami 
356. The output of the photon counters is collected by computer 304, where i, can be 
stored, analyzed, and viewed on video 360. Alternatively, signal distribution optics 
3 1 8 could be a diffraction grating which directs fluorescent signal 3 1 8 onto a CCD 
- array. 

The stability and reproducibility of the positional localization in scanning will 
determine, to a large extent, the resolution for separating closely spaced 
microparticles. Preferably, the scanning systems should be capable of resolv.ng 
closely spaced microparticles, e.g. separated by a particle diameter or less. Thus for 
most applications, e.g. using CPG microparticles, the scanning system should a, ieast 
have the capability of resolving objects on the order of 10-100 urn. Even higher 
resolution may be desirable in some embodiments, but with increase resolution the 
tune required to fully scan a substrate will increase: thus, in some embodiments a 
compromise may have to be made between speed and resolution. Increases in 
scanning time can be achieved by a system which only scans positions where 
microparticles are known to be located, e.g from an initial full scan. Preferably 
m.croparticle size and scanning system resolution are selected to permit resolution of 
fluorescently labeled microparticles randomly disposed on a plane at a density 
between about ten thousand to one hundred thousand microparticles per cm2 

In sequencing applications, loaded microparticles can be fixed to the surface 
of a substrate in variety of ways. The fixation should be strong enough to allow the 
microparticles to undergo successive cycles of reagent exposure and washing without 
significant loss. When the substrate is glass, its surface may be derivatized with an 
alkylam.no linker using commercially available reagents, e.g. Pierce Chemical, which 
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in turn may be cross-linked to avidin, again using conventional chemistries, to form 
an avidinated surface. Biotin moieties can be introduced to the loaded microparticles 
in a number of ways. For example, a fraction, e.g. 10-15 percent, of the cloning 
vectors used to attach tags to polynucleotides are engineered to contain a unique 
restriction site (providing sticky ends on digestion) immediately adjacent to the 
polynucleotide insert at an end of the polynucleotide opposite of the tag. The site is 
excised with the polynucleotide and tag for loading onto microparticles. After 
loading, about 10-15 percent of the loaded polynucleotides will possess the unique 
restriction site distal from the microparticle surface. After digestion with the 
associated restriction endonuclease, an appropriate double stranded adaptor 
containing a biotin moiety is ligated to the sticky end. The resulting microparticles 
are then spread on the avidinated glass surface where they become fixed via the 
biotin-avidin linkages. 

Alternatively and preferably when sequencing by ligation is employed, in the 
initial ligation step a mixture of probes is applied to the loaded microparticle: a 
fraction of the probes contain a type lis restriction recognition site, as required by the 
sequencing method, and a fraction of the probes have no such recognition site, but 
instead contain a biotin moiety at its non-ligating end. Preferably, the mixture 
- comprises about 10-15 percent of the biotinylated probe. 

In still another alternative, when DNA-loaded microparticles are applied to a 
glass substrate, the DNA may nonspecifically adsorb to the glass surface upon several 
hours, e.g. 24 hours, incubation to create a bond sufficiently strong to permit repeated 
exposures to reagents and washes without significant loss of microparticles. 
Preferably, such a glass substrate is a flow cell, which may comprise a channel etched 
in a glass slide. Preferably, such a channel is closed so that fluids may be pumped 
through it and has a depth sufficiently close to the diameter of the microparticles so 
that a monolayer of microparticles is trapped within a defined observation region. 

Identification of Novel Polynucleotides 
in cDNA Libraries 

Novel polynucleotides in a cDNA library can be identified by constructing a 
library of cDNA molecules attached to microparticles, as described above. A large 
fraction of the library, or even the entire library, can then be partially sequenced in 
parallel. After isolation of mRNA, and perhaps normalization of the population as 
taught by Soares et al, Proc. Natl. Acad. Sci., 91 : 9228-9232 (1994), or like 
references, the following primer may by hybridized to the polyA tails for first strand 
synthesis with a reverse transcriptase using conventional protocols (SEQ ID NO: 1): 
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5 ' -mRNA- [A] n -3' 

"fT] 19 - [primer site] -GG [W, W, W, C] gACCAGCTGATC- 5 ' 

where [W,W,W,C)9 represents a tag as described above, " ACC AGCTG ATC " is an 
optional sequence forming a restriction site in double stranded form, and "primer site" 
is a sequence common to all members of the library that is later used as a primer 
binding site for amplifying polynucleotides of interest by PCR. 

After reverse transcription and second strand synthesis by conventional 
techniques, the double stranded fragments are inserted into a cloning vector as 
described above and amplified. The amplified library is then sampled and the sample 
amplified. The cloning vectors from the amplified sample are isolated, and the tagged 
cDNA fragments excised and purified. After rendering the tag single stranded with a 
polymerase as described above, the fragments are methylated and sorted onto 
microparticles in accordance with the invention. Preferably, as described above, the 
cloning vector is constructed so that the tagged cDNAs can be excised with an 
endonuclease, such as Fok I, that will allow immediate sequencing by the preferred 
single base method after sorting and ligation to microparticles. 

Stepwise sequencing is then carried out simultaneously on the whole library, 
or one or more large fractions of the library, in accordance with the invention until a 
sufficient number of nucleotides are identified on each cDNA for unique 
representation in the genome of the organism from which the library is derived. For 
example, if the library is derived from mammalian mRNA then a randomly selected 
sequence 14-15 nucleotides long is expected to have unique representation among the 
2-3 thousand megabases of the typical mammalian genome. Of course identification 
of far fewer nucleotides would be sufficient for unique representation in a library 
derived from bacteria, or other lower organisms. Preferably, at least 20-30 
nucleotides are identified to ensure unique representation and to permit construction 
of a suitable primer as described below. The tabulated sequences may then be 
compared to known sequences to identify unique cDNAs. 

Unique cDNAs are then isolated by conventional techniques, e.g. constructing 
a probe from the PCR amplicon produced with primers directed to the prime site and 
the portion of the cDNA whose sequence was determined. The probe may then be 
used to identify the cDNA in a library using a conventional screening protocol. 

The above method for identifying new cDNAs may also be used to fingerprint 
mRNA populations, either in isolated measurements or in the context of a 
dynamically changing population. Partial sequence information is obtained 
simultaneously from a large sample, e.g. ten to a hundred thousand, or more, of 
cDNAs attached to separate microparticles as described in the above method. 
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Example 1 

Constructi on Of a Tap T ihrary 

An exemplary tag library is constructed as follows to form the chemically 
5 synthesized 9-word tags of nucleotides A, G, and T defined by the formula: 

3'-TGGC-[4(A,G,T)9]-CCCCp 

where "ftA.G.W indicates a tag mixture where each tag consists of nine 4-mer 
0 words of A, G. and T; and «p" indicate a 5' phosphate. This mixture is Iigated to the 
following right and left primer binding regions (SEQ ID NO: 4 and SEQ ID NO 5): 

CACCGACCCGTAGCCp GGGTCAGTCGCAGCTA 
LEFT RIGHT 

The right and left primer binding regions are Iigated to the above tag mixture after 
which the single stranded portion of the Iigated structure is filled with DNA 
polymerase then mixed with the right and left primers indicated below and amplified 
to give a tag library (SEQ ID NO: 6). 
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Left Primer 

5 1 - AGTGGCTGGGCATCGGACCG 

CCCCGGGTCAGTCGCAGCTA- 5 * 
Right Primer 

The underlined portion of the left primer binding region indicates a Rsr II recognition 
sue. The left-most underlined region of the right primer binding region indicates 
recognition sites for Bsp 1201, Apa I. and Eco O 1091, and a cleavage site for Hga I 
The right-most underlined region of the right primer binding region indicates the 
recognition site for Hga I. Optionally, the right or left primers may be synthesized 
with a bionn attached (using conventional reagents, e.g. available from Clontech 
Laboratories, Palo Alto, CA) to facilitate purification after amplification and/or 
cleavage. 
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primer binding site _ p pu mi site 

* i 

(plasmid) -5' -AAAAGGAGGAGGCCTTGATAGAGAGGACCT- 
-TTTTCCTCCTCCGGAACTATCTCTCCTGGA- 
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primer binding site 
I 

-GTTTAAAC-GGATCC-TCTTCCTCTTCCTCTTCC-3 ' - (plasmid) 

-CAAATTTG - CCTAGG -AGAAGGAGAAGGAG AAGG- 
t T 

Bam HI site 

Pme 1 site 

The plasmid is cleaved with Ppu Ml and Pme I (to give a Rsr Il-compatible end and a 
flush end so that the insert is oriented) and then methylated with DAM methylase. 
The tag-containing construct is cleaved with Rsr II and then ligated to the open 
plasmid. after which the conjugate is cleaved with Mbo I and Bam HI to permit 
ligation and closing of the plasmid. The plasmid is then amplified and isolated and 
used in accordance with the invention. 
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Example 3 

Changes in Gene Expressio n Profiles in l iv er Tissue of Rate 
Exposed to Various Y enobiotic A gents 
In this experiment, to test the capability of the method of the invention to 
detect genes induced as a result of exposure to xenobiotic compounds, the gene 
expression profile of rat liver tissue is examined following administration of several 
compounds known to induce the expression of cytochrome P-450 isoenzymes. The 
results obtained from the method of the invention are compared to results obtained 
from reverse transcriptase PCR measurements and immunochemical measurements of 
the cytochrome P-450 isoenzymes. Protocols and materials for the latter assays are 
described in Morris et al, Biochemical Pharmacology, 52: 781-792 (1996). 

Male Sprague-Dawley rats between the ages of 6 and 8 weeks and weighing 
200-300 g are used, and food and water are available to the animals ad lib. Test 
compounds are phenobarbital (PB), metyrapone (MET), dexamethasone (DEX) 
clofibrate (CLO), corn oil (CO), and P-naphthoflavone (BNF), and are available from 
S.gma Chemical Co. (St. Louis, MO). Antibodies against specific P-450 enzymes are 
available from the following sources: rabbit anti-rat CYP3A1 from Human Biologies 
Inc. (Phoenix, AZ); goat anti-rat CYP4A1 from Daiichi Pure Chemicals Co. (Tokyo 
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_ Japan); monoclonal mouse anti-rat CYP1 Al , monoclonal mouse anti-rat CYP2C11, 
goat anti-rat CYP2E1, and monoclonal mouse anti-rat CYP2B1 from Oxford 
Biochemical Research, Inc. (Oxford, MI). Secondary antibodies (goat anti-rabbit IgG. 
rabbit anti-goat IgG and goat anti-mouse IgG) are available from Jackson 
ImmunoResearch Laboratories (West Grove, PA). 

Animals are administered either PB ( 1 00 mg/kg), BNF ( 1 00 mg/kg), MET 
(100 mg/kg), DEX (100 mg/kg), or CLO (250 mg/kg) for 4 consecutive days via 
intraperitoneal injection following a dosing regimen similar to that described by 
Wang et al. Arch. Biochem. Biophys. 290: 355-361 (1991). Animals treated with 
H 2 0 and CO are used as controls. Two hours following the last injection (day 4), 
animals are killed, and the livers are removed. Livers are immediately frozen and 
stored at -70°C. 

Total RNA is prepared from frozen liver tissue using a modification of the 
method described by Xie et al, Biotechniques, 11: 326-327 (1991). Approximately 
100-200 mg of liver tissue is homogenized in the RNA extraction buffer described by 
Xie et al to isolate total RNA. The resulting RNA is reconstituted in 
diethylpyrocarbonate-treated water, quantified spectrophotometrically at 260 nm, and 
adjusted to a concentration of 1 00 ug/ml. Total RNA is stored in 
- diethylpyrocarbonate-treated water for up to 1 year at -70°C without any apparent 
degradation. RT-PCR and sequencing are performed on samples from these 
preparations. 

For sequencing, samples of RNA corresponding to about 0.5 ug of poly(A) + 
RNA are used to construct libraries of tag-cDNA conjugates following the protocol 
described in the section entitled "Attaching Tags to Polynucleotides for Sorting onto 
Solid Phase Supports," with the following exception: the tag repertoire is constructed 
from six 4-nucleotide words from Table II. Thus, the complexity of the repertoire is 
86 or about 2.6 x 10*. For each tag-cDNA conjugate library constructed, ten samples 
of about ten thousand clones are taken for amplification and sorting. Each of the 
amplified samples is separately applied to a fixed monolayer of about 10* 10 um 
diameter GMA beads containing tag complements. That is, the "sample" of tag 
complements in the GMA bead population on each monolayer is about four fold the 
total size of the repertoire, thus ensuring there is a high probability that each of the 
sampled tag-cDNA conjugates will find its tag complement on the monolayer. After 
the oligonucleotide tags of the amplified samples are rendered single stranded as 
described above, the tag-cDNA conjugates of the samples are separately applied to the 
monolayers under conditions that permit specific hybridization only between 
oligonucleotide tags and tag complements forming perfectly matched duplexes. 
Concentrations of the amplified samples and hybridization times are selected to 
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~ "permit me" loading of about 5 x 10 4 to 2 x 10 s tag-cDNA conjugates on each bead 
where perfect matches occur. After ligation, 9-12 nucleotide portions of the attached 
cDNAs are determined in parallel by the single base sequencing technique described 
by Brenner in International patent application PCT/US95/03678. Frequency 
5 distributions for the gene expression profiles are assembled from the sequence 
information obtained from each of the ten samples. 

RT-PCRs of selected mRNAs corresponding to cytochrome P-450 genes and 
the constitutively expressed cyclophilin gene are carried out as described in Morris et 
al (cited above). Briefly, a 20 uL reaction mixture is prepared containing lx reverse 
0 transcriptase buffer (Gibco BRL), 1 0 nM dithiothreitol, 0.5 nM dNTPs, 2.5 \xM oligo 
d(T) I5 primer, 40 units RNasin (Promega, Madison, WI), 200 units RNase H-reverse 
transcriptase (Gibco BRL), and 400 ng of total RNA (in diethylpyrocarbonate-treated 
water). The reaction is incubated for 1 hour at 37°C followed by inactivation of the 
enzyme at 95°C for 5 min. The resulting cDNA is stored at -20°C until used. For 
PCR amplification of cDNA, a 10 uL reaction mixture is prepared containing lOx 
polymerase reaction buffer, 2 mM MgCl 2 , 1 unit Taq DNA polymerase (Perkin- 
Elmer, Norwalk. CT), 20 ng cDNA, and 200 nM concentration of the 5' and 3' 
specific PCR primers of the sequences described in Morris et al (cited above). PCRs 
-are carried out in a Perkin-Elmer 9600 thermal cycler for 23 cycles using melting, 
annealing, and extension conditions of 94°C for 30 sec, 56°C for 1 min., and 72°C 
for 1 min.. respectively. Amplified cDNA products are separated by PAGE using 5% 
native gels. Bands are detected by staining with ethidium bromide. 

Western blots of the liver proteins are carried out using standard protocols 
after separation by SDS-PAGE. Briefly, proteins are separated on 1 0% SDS-PAGE 
gels under reducing conditions and immunoblotted for detection of P-450 isoenzymes 
using a modification of the methods described in Harris et al, Proc. Natl. Acad. Sci., 
88: 1407-1410 (1991). Protein are loaded at 50 ug/lane and resolved under constant 
current (250 V) for approximately 4 hours at 2°C. Proteins are transferred to 
nitrocellulose membranes (Bio-Rad, Hercules, CA) in 15 mM Tris buffer containing 
120 mM glycine and 20% (v/v) methanol. The nitrocellulose membranes are blocked 
with 2.5% BSA and immunoblotted for P-450 isoenzymes using primary monoclonal 
and polyclonal antibodies and secondary alkaline phosphatase conjugated anti-IgG. 
Immunoblots are developed with the Bio-Rad alkaline phosphatase substrate kit. 

The three types of measurements of P-450 isoenzyme induction showed 
substantial agreement. 
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APPENDIX la ~ 

Exemplary compute r program fnr generating 
minimally cross hybridizing sets 
(single stranded tag/single stranded tag complement) 



Program minxh 
c 



c 
c 



c 
c 



c 
c 



c 
c 

c 
c 
c 

c 
c 
c 



900 

c 

c 



integers subl (6) ,msetl (1000, 6) ,mset2 (1000, 6) 
dimension nbase(6) 



write ( * , * ) 1 ENTER SUBUNIT LENGTH 1 
read(*, 100)nsub 
100 format (il) 



open ( 1 , f He- • subA ,dat' ( f orm= • formatted • , status^ 'new') 



nset=0 

do 7000 ml«l, 3 
do 7000 m2=l, 3 
do 7000 m3=l, 3 
do 7000 m4 = l, 3 
subl (l)=ml 
subl (2)=m2 
subl ; 3)=m3 
subl (4}»m4 



ndif f=3 



Generate set of subunits differing from 
subl by at least ndiff nucleotides. 
Save in mset 1 . 



do 900 j=l,nsub 

mset 1(1, j)=subl (j) 



do 1000 kl«l,3 
do 1000 k2 = l,3 
do 1000 k3=l,3 
do 1000 k4 = l, 3 



nbase (l)=kl 
nbase(2)=k2 
nbase(3)=k3 
nbase(4)=k4 
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1200 

c 
c 

c 
c 

c 
c 
c 
c 

1100 

c 
c 

1000 

c 

c 

1325 



c 
c 
c 

c 
c 
c 
c 
c 
c 
c 
c 
c 
c 
c 
c 

c 
c 

1700 
c 



n=0 

do 1200 j=l,nsub 

if (subl ( j ) .eq. 1 .and. nbase ( j ) . ne . 1 .or. 

1 subl(j).eq.2 .and. nbase ( j ) . ne . 2 .or. 

3 subf ( j ) . eq. 3 .and. nbase ( j ) . ne . 3 ) then 

n=n+l 
endif 
continue 



if (n.ge.ndiff ) then 



If number of mismatches 
is greater than or equal 
to ndiff then record 
subunit in matrix mset 



do 1100 i = l, nsub 

msetl ( j j , i ) =nbase < i ) 
endif 



continue 



do 1325 j2=l,nsub 
mset2 (1, j2) =msetl { 1 f j 2 ) 
mset2{2, j2)=mset 1 (2, j2) 



Compare subunit 2 from 
msetl with each successive 
subunit in msetl, i.e. 3, 
4,5, ... etc. Save those 
with mismatches . ge . ndiff 
in matrix mset2 starting at 
position 2. 

Next transfer contents 
of mset2 into msetl and 
start 

comparisons again this time 
starting with subunit 3. 
Continue until all subunits 
undergo the comparisons. 



npass=0 



continue 
Jck=npass + 2 
npass=npass+ 1 
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c 



1600 

1625 
1500 

C 

c 
c 
c 
c 
c 
c 

2000 



c 

7009 

7008 
7010 

120 
7000 

c 
c 

c 



2 
2 



do'fSOO m=npass + 2,jj 

n=0 ... — 
do 1600 j=l,nsub 

if (msetl <npass+l, j) . eq. 1 . and .mset 1 fm, j ) .ne.l.or 
msetl(npass+l # j) . eq . 5 . and .mset 1 (m, j ) . ne . 2 . or 
msetl(npass + l,j) . eq . 3 . and .mset 1 (m, j) .ne.3) then 
n=n+ 1 

endif 
continue 
if {n.ge.ndiff ) then 
kk=kk+l 

do 1625 i=l,nsub 

mset 2 (kk,i)=msetl {m, i) 

endif 
continue 



kk is the number of subunits 
stored in mset2 

Transfer contents of mset2 
into msetl for next pass. 



do 2000 k=l, kk 

do 2000 m= 1 , nsub 

msetl (k,m)=mset2 (k,m) 
if (kk.lt. jj) then 
jj=kk 
goto 1700 
endif 



nset-nset+1 
write ( 1, 7009) 

format (/ } 
do 7008 k=l, kk 

write (1,7010) (mset 1 ( k, m) , m-1 , nsub) 
format Mil) 
write(*,*) 

write(* # 120) kk,nset 

format (Ix, 'Subunits in set- \i5, 2x, • Set No-' i*\ 
continue 
closed) 



end 
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APPENDIX lb 

Exemplary computer program for generating 
minimally cross hybridizing sets 
(single stranded tag/single stranded tag complement) 



Program tagN 

c 
c 

c Program tagN generates minimally cross-hybridizing 

c sets of subunits given i) N--subunit length, and ii) 

c an initial subunit sequence. tagN assumes that only 

c 3 of the four natural nucleotides are used in the taos. 

c 

c 

character*l subl (20) 

integer+2 mset (10000, 20) , nbase(20) 



write (*,*) 'ENTER SUBUNIT LENGTH ' 

readC, 100)nsub 
100 format (i2) 

c 
c 

write (*,*) 'ENTER SUBUNIT SEQUENCE ' 
read(*, 110) (subl (k) , k=l,nsub) 

110 format (20al) 

c 

c 

ndif f-10 

c 

c Let a=l c=2 g=3 £. t = 4 



900 continue 

c 
c 
c 



do 800 kk=l,nsub 

if (subl (kk) .eq. 'a' ) then 

mset (1, kk)=l 

endif 

if (subl (kk) .eq. 'c f ) then 
mset (1, kk) =2 
endif 

if (subl (kk) .eq. 'g' ) then 
mset (1, kk) =3 
endif 

if (subl (kk) .eq. ' t * ) then 
mset (1, kk) =4 
endif 



Generate set of subunits differing from 
subl by at least ndiff nucleotides. 



do 1000 ki=l,3 
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c 

c 



do 1000 k2=l, 3 
•---'•"do 1O00 k3=l,3 " 
do 1000 k4=l,3 
do 1000 k5=l,3 
do 1000 k6=l,3 
do 1000 k7=l, 3 
do 1000 k8«l, 3 
do 1000 k9=l,3 
do 1000 kl0-l,3 

do 1000 kll-1, 3 
do 1000 kl2=l, 3 
do 1000 kl3=l,3 
do 1000 kl4=l,3 
do 1000 kl5=l,3 
do 1000 kl6=l, 3 
dc 1000 kl7=i,3 
do 1000 kl8=l, 3 
do 1000 kl9=l,3 
do 100C k20=l,3 



nbase {l)=kl 
nbase (2) = k2 
nbase (3)=k3 
nbase ( 4 ) = k4 
nbase (5)=k5 
nbase (6)=k6 
nbase (7)=k7 
nbase (8)=k8 
nbase (9)=k9 
nbase (10)=kl0 
nbase{ll)=kll 
nbase (12) =kl2 
nbase (13) =kl 3 
nbase (14)=kl4 
nbase(15)=kl5 
nbase(16)=kl6 
nbase(17)=kl7 
nbase (18) =kl8 
nbase (19) =kl9 
nbase (20) =k20 



do 1250 nn=l, j j 



1200 

c 

c 



1250 
c 



n=0 

do 1200 j=l,nsub 

if (mset (nn, j ) .eq. 1 .and. 
mset (nn, j) .eq.2 .and. 
mset (nn, j ) .eq. 3 . and. 
mset (nn, j ) .eq. 4 .and. 
n=n+l 
endif 
continue 



if (n.lt.ndiff ) then 

goto 1000 

endif 
continue 



nbase(j) . ne . 1 .or. 
nbase(j) .ne.2 .or. 
nbase (j) . ne . 3 .or. 
nbase ( j ) . ne . 4 ) then 



write (*, 130) (nbase (i ), i=l , nsub) , i j 
do 1100 i=l, nsub 
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1100 
c 

1000 

c 

c 

130 

120 

c 

c 

c 
c 
c 



mset ( j j, i )=nbase (ij- •* 
continue 



continue 



write(*, *) 

format ( lOx, 20 { lx f i 1 ) , Sx, i5 ) 
write ( * , * ) 
write{*,120) jj 

f ormat (lx, 1 Number of words= , / i5) 



end 
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APPENDIX Ic " 

Exemplary comp uter prog ra m for generating 
minimally cross hybridizing sets 
(double stranded tag/single stranded tag complement) 



Program 3tagN 
c 
c 
c 
c 
c 
c 
c 



c 
c 



c 
c 



Program 3tagN generates minimally cross-hybridizing 
sets of duplex subunits given i) N— subunit length, 
and 11) an initial homopurine sequence. 

character*! subl (20) 

integer*2 mset < 10000, 20 ) , nbase(20) 



write ( * , * ) ' ENTER SUBUNIT LENGTH 
read{ *, 100)nsub 
100 format (i2) 



write**,*) 'ENTER SUBUNIT SEQUENCE a & a only' 
read (*, 110) (subl (k) , k=l,nsub) 
110 format (20al ) 

c 

ndiff=10 

c 

c Let a= 1 and g=2 

do 800 kk=l,nsub 

if (subl(kk) .eq. 'a') then 

mset (1, kk) =1 

endif 

if (subl (kk) .eq. 'a* ) then 
mset (1, kk)=2 
endif 

800 continue 



do 1000 kl=l,3 
do 1000 k2=l,3 
do 1000 k3=l, 3 
do 1000 k4=l, 3 
do 1000 k5=l, 3 
do 1000 k6«l, 3 
do 1000 k7»i,3 
do 1000 k8=l, 3 
do 1000 k9=l, 3 
do 1000 kl0=l, 3 

do 1000 kll=l, 3 
do 1000 k!2=l, 3 
do 1000 kl3=l, 3 
do 1000 kl4=l, 3 
do 1000 kl5=l, 3 
do 1000 k!6=l, 3 
do 1000 kl7=i,3 
do 1000 k!8=l, 3 
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dc -1000 kl9-l, 3 
do 1000 k20=l,3 



nbase (1)= 
nbase (2)= 
nbase (3)= 
nbase (4 ) = 
nbase (5) = 
nbase (6)= 
nbase (7) = 
nbase (8) = 
nbase (9) = 
nbase (10) 
nbase (11) 
nbase (12) 
nbase (13) 
nbase (14 ) 
nbase (15) 
nbase (16) 
nbase ( 17 ) 
nbase (18) 
nbase (19) 
nbase (20) 



■kl 
k2 
k3 
k4 
k5 
k6 
k7 
k8 
k9 

= kl0 
= kll 
=kl2 
= kl3 
-kl4 
= kl5 
=kl6 
= kl7 
=kl8 
= kl9 
=k20 



c 
c 



1200 
c 



1250 
c 



1100 
c 

1000 
c 

130 



120 



do 1250 nn=l, j j 
n=0 

do 1200 j=l / nsub 

if (mset (nn, j ) . eq. 1 .and. nbase(j) 
mset (nn, j ) . eq. 2 .and. nbase(j) 
mset (nn, j ) .eq. 3 .and. nbase(j) 
mset (nn, j ) .eq. 4 .and. nbase(j), 
n=n+l 
endi f 
continue 

if (n.lt .ndiff ) then 

goto 1000 

endif 
cont inue 

jj=j j+1 

write (*, 130) (nbase (i) , i=l,nsub) p jj 
do 1100 i=l,nsub 

mset (j j, i)=nbase(i) 
continue 

continue 
write ( * , * ) . 

format <10x, 20 (lx, il) , 5x, i5) 
write (* f * J 
write(*,120) jj 

format ( lx, 1 Number of wcrds=',i5) 



ne, 
ne. 
ne. 
ne . 



. or . 
. or . 
. or . 
then 



end 



-51- 



WO 97/13877 



PCT/US96/16342 



SEQUENCE LISTING 

(1) GENERAL INFORMATION: 

(i) APPLICANT: David W. Martin, Jr. 

&iJtTL?L5S N: «« ^pression profiles 



(iii) NUMBER OF SEQUENCES: 7 



(iv) CORRESPONDENCE ADDRESS: 

(C) CITY: Hayward 

(D) STATE: California 

(E) COUNTRY: USA 

(F) ZIP: 94545 



(vj COMPUTER READABLE FORM: 

(Aj MEDIUM TYPE: 3.5 inch diskette 

(B) COMPUTER: IBM compatible 

(C) OPERATING SYSTEM: Windows 3 1 

(D) SOFTWARE: Microsoft Word 5 1 



(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 

(C) CLASSIFICATION: 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: PCT/US96 /095 1 3 

(B) FILING DATE: 06-JUN-96 



(vii I PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: PCT/US95/127S1 

(B) FILING DATE: 12-OCT-95 

(viii) ATTORNEY/AGENT INFORMATION: 

(A) NAME: Stephen C. Macevicz 

(B) REGISTRATION NUMBER: 30,285 

(C) REFERENCE/DOCKET NUMBER: 813wo 



(ixj TELECOMMUNICATION INFORMATION- 

(A) TELEPHONE: (510) 670-9365 

(B) TELEFAX: (510) 670-9302 



(2) INFORMATION FOR SEQ ID NO: 1: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 11 nucleotide 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xii SEQUENCE DESCRIPTION: SEQ ID NO: 1: 



CTAGTCGACC A 



(2) INFORMATION FOR SEQ ID NO: 2: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 11 nucleotides 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NC: 2: 



NRRGATCYNN N 



(2} INFORMATION FOR SEQ ID NO: 3: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 38 nucleotides 
(3) TYPE: nucleic acid 
(C) STRANDEDNESS: single 
(DJ TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 



GAGGATGCCT TTATGGATCC ACTCGAGATC CCAATCCA 



(2) INFORMATION FOR SEQ ID NO: 4: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 nucleotides 

(B) TYPE: nucleic acid 
(CJ STRANDEDNESS: double 
(D) TOPOLOGY: linear 



(xi} SEQUENCE DESCRIPTION: SEQ ID NO: A: 



AGTGGCTGGG CATCGGACCG 



(2) INFORMATION FOR SEQ ID NO: 5: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 nucleotides 

(B) TYPE: nucleic acid 
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fC) STRANDEDNESS: double 
• fDr TOPOrOGY: ~ linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 
GGGGCCCAGT CAGCGTCGAT 

(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 nucleotides 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 
ATCGACGCTG ACTGGGCCCC 

(2) INFORMATION FOR SEQ ID NO: 7: 

Ji) SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 62 nucleotide* 

(B) TYPE: nucleic acid 
iC) STRANDEDNESS: double 
(D} TOPOLOGY: linear 

(xi} SEQUENCE DESCRIPTION: SEQ ID NO: 7: 

AAAAGGAGGA GGCCTTGATA GAGAGGACCT GTTTAAACGG ATCCTCTTCC 50 
TCTTCCTCTT C r 

62 



20 



16 
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I claim:" * 

1 . A method of determining the toxicity of a compound, the method comprising 
the steps of: 

5 administering the compound to a test organism; 

extracting a population of mRNA molecules from each of one or more tissues 
of the test organism; 

forming a separate population of cDNA molecules from each population of 
mRNA molecules from the one or more tissues such that each cDNA molecule of a 
0 separate population has an oligonucleotide tag attached, the oligonucleotide tags 
being selected from the same minimally cross-hybridizing set; 

separately sampling each population of cDNA molecules such that 
substantially all different cDNA molecules within a separate population have different 
oligonucleotide tags attached; 

sorting the cDNA molecules of each separate population by specifically 
hybridizing the oligonucleotide tags with their respective complements, the respective 
complements being attached as uniform populations of substantially identical 
complements in spatially discrete regions on one or more solid phase supports; 

determining the nucleotide sequence of a portion of each of the sorted cDNA 
molecules of each separate population to form a frequency distribution of expressed 
genes for each of the one or more tissues; and 

correlating the frequency distribution of expressed genes in each of the one or 
more tissues with the toxicity of the compound. 

2. The method of claim 1 wherein said oligonucleotide tag and said complement 
of said oligonucleotide tag are single stranded. 

3. The method of claim 2 wherein said oligonucleotide tag consists of a plurality 
of subunits, each subunit consisting of an oligonucleotide of 3 to 9 nucleotides in 
length and each subunit being selected from the same minimally cross-hybridizing set. 

4. The method of claim 3 wherein said one or more solid phase supports are 
microparticles and wherein said step of sorting said cDNA molecules onto the 
microparticles produces a subpopulation of loaded microparticles and a subpopulation 
of unloaded microparticles. 

5. The method of claim 4 further including a step of separating said loaded 
microparticles from said unloaded microparticles. 
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6. The method of claim 5 further including a step of repealing said steps of 
sampling, sorting, and separating until a number of said loaded microparticles is 
accumulated is at least 10,000. 

5 

7. The method of claim 6 wherein said number of loaded microparticles is at 
least 100,000. 

8. The method of claim 7 wherein said number of loaded microparticles is at 
1 0 least 500,000. 



9. The method of claim 5 further including a step of repeating said steps of 
sampling, sorting, and separating until a number of said loaded microparticles is 
accumulated is sufficient to estimate the relative abundance of a cDNA molecule 
present in said population at a frequency within the range of from 0.1% to 5% with a 
95% confidence limit no larger than 0.1% of said population. 



15 



20 



25 



30 



10. The method of claim 4 wherein said test organism is a mammalian tissue 
- culture. 



11. The method of claim 10 wherein said mammalian tissue culture comprises 
hepatocytes. 

12. The method of claim 4 wherein said test organism is an animal selected from 
the group consisting of rats, mice, hamsters, guinea pigs, rabbits, cats, dogs. pigs, and 
monkeys. 

13. The method of claim 12 wherein said one or more tissues are selected from the 
group consisting of liver, kidney, brain, cardiovascular, thyroid, spleen, adrenal, large 
intestine, small intestine, pancrease urinary bladder, stomach, ovary, testes, and 
mesenteric lymph nodes. 



35 



14. A method of identifying genes which are differentially expressed in a selected 
tissue of a test animal after treatment with a compound, the method comprising the 
steps of: 

administering the compound to a test animal; 
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extracting a population of mRN A molecules from the selected tissue of the 
test animal; 

forming a population of cDNA molecules from the population of mRNA 
molecules such that each cDNA molecule has an oligonucleotide tag attached, the 
5 oligonucleotide tags being selected from the same minimally cross-hybridizing set; 

sampling the population of cDNA molecules such that substantially all 
different cDNA molecules have different oligonucleotide tags attached; 

sorting the cDNA molecules by specifically hybridizing the oligonucleotide 
tags with their respective complements, the respective complements being attached as 
1 0 uniform populations of substantially identical complements in spatially discrete 
regions on one or more solid phase supports; 

determining the nucleotide sequence of a portion of each of the sorted cDNA 
molecules to form a frequency distribution of expressed genes; and 

identifying genes expressed in response to administering the compound by 
1 5 comparing the frequencing distribution of expressed genes of the selected tissue of the 
test animal with a frequency distribution of expressed genes of the selected tissue of a 
control animal. 

* 15. The method of claim 1 4 wherein said oligonucleotide tag and said 
20 complement of said oligonucleotide tag are single stranded. 

1 6. The method of claim 1 5 wherein said oligonucleotide tag consists of a 
plurality of subunits, each subunit consisting of an oligonucleotide of 3 to 9 
nucleotides in length and each subunit being selected from the same minimally cross- 

25 hybridizing set. 

1 7. The method of claim 1 6 wherein said one or more solid phase supports are 
microparticles and wherein said step of sorting said cDNA molecules onto the 
microparticles produces a subpopulation of loaded microparticles and a subpopulation 

30 of unloaded microparticles. 

1 8. The method of claim 1 7 further including a step of separating said loaded 
microparticles from said unloaded microparticles. 

35 19. The method of claim 18 further including a step of repeating said steps of 
sampling, sorting, and separating until a number of said loaded microparticles is 
accumulated is at least 10,000. 
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... - -20. .. _ The method of claim 19 wherein said number of loaded microparticles is at 
least 100.000. — 

21. The method of claim 20 wherein said number of loaded microparticles is at 
5 least 500,000. 

22. The method of claim 1 8 further including a step of repeating said steps of 
sampling, sorting, and separating until a number of said loaded microparticles is 
accumulated is sufficient to estimate the relative abundance of a cDNA molecule 

0 present in said population at a frequency within the range of from 0.1% to 5% with a 
95% confidence limit no larger than 0.1% of said population. 

23. The method of claim 1 7 wherein said test animal is selected from the group 
coasting of rats, mice, hamsters, guinea pigs, rabbits, cats, dogs, pigs, and monkeys. 

24. The method of claim 23 wherein said selected tissue is selected from the 
group consisting of liver, kidney, brain, cardiovascular, thyroid, spleen, adrenal large 
intestine, small intestine, pancrease urinary bladder, stomach, ovary, testes, and 

- mesenteric lymph nodes. 

25. A use of the technique of massively parallel signature sequencing to determine 
the toxicity of a compound in a test organism, the use comprising the steps of: 

administering the compound to a test organism; 

extracting a population of mRNA molecules from each of one or more tissues 
of the test organism and forming a population of cDNA molecules for each of the one 
or more tissues; 

determining the nucleotide sequence of a portion of each of the cDNA 
molecules of each separate population using massively parallel signature sequencing 
to form a frequency distribution of expressed genes for each of the one or more 
tissues; and 

correlating the frequency distribution of expressed genes in each of the one or 
more tissues with the toxicity of the compound. 

26. The use of claim 25 wherein said test organism is a mammalian tissue culture. 

27. The use of claim 26 wherein said mammalian tissue culture comprises 
hepatocytes. 
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"'28:~ The uscfftf claim 25 wherein said test organism is an animal selected from the 
group consisting of rats, mice, hamsters, guinea pigs, rabbits, cats, dogs, pigs, and 
monkeys. 

29. The use of claim 28 wherein said one or more tissues are selected from the 
group consisting of liven kidney, brain, cardiovascular, thyroid, spleen, adrenal, large 
intestine, small intestine, pancrease urinary bladder, stomach, ovary, testes, and 
mesenteric lymph nodes. 

30. A use of the technique of massively parallel signature sequencing to identify 
genes which are differentially expressed in a test organism after treatment with a 
compound and which are correlated with toxicity of the compound, the use 
comprising the steps of: 

administering the compound to the test organism; 

extracting a population of mRNA molecules from a selected tissue of the test 
organism and forming a population of cDNA molecules; 

determining the nucleotide sequence of a portion of each of the cDNA 
molecules using massively parallel signature sequencing to form a frequency 
- distribution of expressed genes; 

identifying genes expressed in response to administering the compound by 
comparing the frequencing distribution of expressed genes of the selected tissue of the 
test organism with a frequency distribution of expressed genes of the selected tissue 
of a control organism; and 

determining whether the genes expressed in response to administering the 
compound are correlated with toxicity of the compound in the test organism. 



-59- 



WO 97/13877 



PCT/US96/16342 



1/2 



100 



N 



Generate Table Mn of 
all possible submits of 
desired length and composition 



110. 



± 



Select initial subunit 
Si(M) 



120. 



Compare subunit Si to 
successive subunits in 
Table Mn from S+1 to 
end of Table 



125 



Save subunit in 
Table Mn+1 





150 

L 

Replace Mn 
with Mn+1 



Yes 





No 


> 




Discard subunit 


Go to next subunit 


140 J 







Fig. 1 



SUBSTITUTE SHEET (RULE 26) 



WO 97/13877 



PCT/US96/16342 



2/2 




SUBSTITUTE SHEET (RULE 26) 



INTERNATIONAL SEARCH REPORT 



International Application No. 
PCT/US96716342 



A. CLASSIFICATION OP SUBJECT MATTER ~ 
D>C<6) : C12Q 1/68; C07H 21/04 

USCL : 435/6; 536/24.3 
According to International Patent Classifi c ation (IPC) or to both national classification and IPC 

B. FIELDS SEARCHED 

Minimum documentation searched (classification system followed by classification symbols) 

U.S. : 435/6; 536724.3 

Documentation searched other than rninimum documentation to the extent that such documents are included in the fields searched 



Electronic data base consulted during the international search (name of data base and, where practicable, search terms used) 
APS, MEDLINE, BIOSIS, CAPLUS, SCI SEARCH 

search terms: Martin, David W„ toxic?, differential?, express?, cDNA, mRNA, RNA, oene#, hybrid?. 



C. DOCUMENTS CONSIDERED TO BE RELEVANT 



Category* 


Citation of document, with indication, where appropriate, of the relevant passages 


Relevant to claim No. 


A 


CHETVERIN et al. Oligonucleotide arrays: New concepts and 
possibilities. Bio/Technology. 12 November 1994, Vol. 12, 
pages 1093-1099, especially pages 1095-1096. 


1-30 


A 


BRENNER et al. Encoded combinatorial chemistry. 
Proceedings of the National Academy of Sciences USA. June 
1992, Vol. 89. pages 5381-5383. 


1-30 


A 


MATSUBARA et al. cDNA analyses in the human genome 
project. Gene. 15 December 1993, Vol. 135, No. 1-2, paaes 
265-274. 


1-30 



0 

Further documents are listed in the continuation of Box C. | | See patent family annex. 



SpeckJ categories of cited document): 

drjcumcnideftun* fee general stale of lac art which it no* coo* id era! 
to be of particular relevance 

earlier document puhlithrd on or after (he katCTnabooaJ filing date 

document which may dxrow doubts on priority claira(i) or which « 
cited to ertaUiah the pubfeabon dale of another chnboo or other 
special rente* (a* specified) 

document referring to an oral disclosure, uk, exhibition or other 

document pubhahed prior to (he interTkatiooaJ filing date but later thin 

the priority date claimed 



•X" 



later document publiahed after the tntcmational filing dtte or priority 
date and not to conflict with the application but ched to understand the 
principle or theory undertymg the invention 

document of particular rclevmoce; the claimed invention cannot be 
cocuidered novel or cannot be considered to involve an inventive etcp 
when the document ii taken alone 

document of particular relevance: the churned invention cannot be 
considered to involve an inventive step when the documeai u 
combined with one or more other such document*. iucb combination 
being obvious to a person skilled in the art 

document member of the same patent family 



Date of the actual completion of the international search 
27 JANUARY 1997 



Name and mailing address of the ISA/US 
ComrruMioner of Patents and Trademark* 

BoxPCT 

Washington, D.C. 20231 
Facsimile No. (703) 305-3230 



Date of mailing of the international search report 

1 9 FEB 199? 

CT . ' VI- • 



Authorized office 1 

SCOTT D. PRIE0E 
Telephone No. (703) 308-0196 



Form PCT/ISA/210 (second sheet)(July 1992)* 



INTERNATIONAL SEARCH REPORT 

C (Cortinuiion). DOCUMENTS CONSIDERED TO BE RELEVANT 



Intentional application No. 
PCTAJS96716342 



Category* I Citation of document, with indication where «nr,m™ t . " T ' " 

J awicauon, wnere appropnate. of the relevant paasaget Relevant to claim No. 



A 



WO > 95/2 944 Al (SMTTHKLINE BEECHAM CORPORATION) 1-30 
17 August 1995, page 4, lines 1-4, page 5, lines 31-37, page 17 
hnes 15-27, page 18, lines 30-35, page 20, line 23 to ^21 ' 
line 4. ' 



Form PCT/ISA/210 (continuation of second shcet)(JuIy 1992)* 



Docket No.: PF-0300-3 CON 
USSN: 09/745,506 
Ref. No. j_of 19 



FOCUS - 17 of 19 DOCUMENTS 

Copyright 1997 PR Newswire Association, Inc. 
PR Newswire 

August 11, 1997, Monday 

SECTION: Financial News 

DISTRIBUTION: TO BUSINESS AND MEDICAL EDITORS 
LENGTH: 478 words 

^n^A ^ & ?• ^ ACada Bioscienc « Enter Into Research Collaboration; 
Ftrst Corporate Agreement for Acacia's Genome Reporter Matrix(TM) 

DATELINE: RICHMOND. Calif. , Aug. 1 1 

BODY: 

tou^cS^ 

-^■j^^c; uses yeast i a ■ ^ grm 

the yeast and human genome the JrSLJvfL »?™n * ex P ress ' on - Becau " of the similarities between 
indeed by a biologically 2»e LS ** ** ^ ^ effects 

• ha^b^ 2 r ^ 8iVCn * e ^ ° f with 

the potential to provide enormous ^^^^■^"^ ^ boratones - innovative CRM has 

process more rational. ,t should '^^^^^^^2^ ** * 

^^^^^S^!' ^-ica. response profUes and genetic response profiles. The 
with ^MpnS^^^^S^SS^ by , POtemial therapCUdCS ^ -en rank genes 

caused by mutations in thegJSZ^z^^lfT^^ ™»» chan S« * Bene expression 

gold standards in drug discover^ ?£XoTS f^Jn^ST^^;^ WiCreWpiofi,es re P resent 
specificity. By comparing the t^o profiS^ne ^7 P f ° r dnigS Wi * perfeCt se,ectiv "y 

a ■perfect' drug. * Pn>fileS - ° De ^ a P° tenual dru 8 candidate's ability to rnimic the action of 

T P ^ ? Vd ° Ping Pr ° Prietary IeChDOl0gieS 10 «* speed 

in genomic and 3^TS^S^Sr " "P"^ ° D "* ^ ^ 

SOURCE Acacia Biosciences comprehens.ve profiles of drug candidates' in vivo activity. 

sJZSSESr COhen • ** CE ° ° f AC8da Bi0SdeDCeS ' 510-669-2330 ext. ,03 or Media: Linda 



LOAD-DATE: August 12, 1997 



The Btoreactor Market: 

Steady Growth Expected 



far li MMKton «ns Mluad 
M$27Sfi«k>nfar IW, 
«*<sup«a*dcftb* 
«onh $M0 mOwi bv 2001 
Tin ■ mm* i 




*t CE281M 
MO. 16 




Docket No.: PF-0300-3 CON 
USSN: 09/745,506 
Ref.No. 9 of 19 



ENGINEERING 

NEWS 



■SIS 





Pharmagene 
Raises More 
Capital for 
Research on 
Human 
Tissues 

By Sophia Fox 

TJharmigene, the Royston. 
l^UK. -based biopharmaceuti- 
cal company specialising in 
the use of human biomaterials for 
drug discovny research, has raised a 
further £5 million from a group of 
investors led by 3 i and Abacus 
Nominees. The ftinding will enable 
the company to expand both its 
human biomaterials collection and 
its capabilities across a range of pro- 
prietary platform techrotogies. 
^Gordon Baxter, Ph.D., 
Pharmagene $ cofounder and chief 
operating officer, claimed "by the 
end of this year Pharmagene will 
have access to the largest collection 
of human RNAs and proteins any- 
where in the world, and a range of 
innovative, yet robust technologies 
seepharmagene.ro 
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^ r ^fl?J?er Acquires PerSeptive to Expand 
Its Capabilities in Gene-BasedDrug Discovery 



By John Sterling 

P dun-Elmer's (PE; Norwalk, 
CT) decision last month to 
acquire PerSeptrve Blo- 
systems (Frammgham, MA) via a 
S360 million stock swap was 
designed to strengthen PE in terms 
of broad capabilities in gene-based 
drug discovery. The company* 
main goal is to develop new prod- 
ucts to improve the integration of 
genetic and protein research. 

"This merger will enhance our 
position as an effective provider of 
innovative, integrated platforms 
enabling our customers to be more 
efficient and cost-effective in bring- 
ing new pharmaceuticals to mar- 
ket," says Tony L. White, PE* 
chairman, president and CEO. 'The 
combination of our two companies 
should bolster our presence in the 
life sciences, [and it is our] bdief 
that we must take bold action now 
lo lead the emerging era of molecu- 
lar medicine with leading positions 
in both genetic and protein analy- 
sis.'* 

A driving force behind the 
merger is the vast amount of genet- 
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PerSeptive 
Bios ystems 
I for $360 
million to 
obtain ne*- 
technologies 
in mass 
spectrome- 
try, biosepa- 
rations and 
purification 
forproduct 
development 
projects, 
spanning the 
mngejrom 
genomics to 
protromics. 



ic information about human dis- 
ease that is being accumulated by 
researchers and biotcch companies 
working in the area of genomics. It 
is becoming increasingly obvious 
that these data need to be comple- 
mented with technologies for 



FDA OKs Genzyme's Carticel 
Product for Damage to Knees 



- Periosteal flap 



Carticel, v^uchwas approved 'for the repair ofciinicallv significant, symp- 
tomatic cartilaginous defects qfthejemoml condvle (medial, lateral or 
trochlear) caused by acute or repetitive trauma, employ* a proprietary 
process to gm» autologous cartilage cell* for implantation. 



By Naomi PfehTer 

The FDA has approved a knee- 
cartilage rep b eement product 
made by Genzyme Tissue 
Repair (Cambridge. MAy, a track- 
ing-stock division of Genzyme 
Corp., for people with trauma- 
damaged knees. 

Carticel'" (autologous cultured 
choixirocytes) is the first product to 
be licensed under the FDAs pro- 

SEE GENZYME, P. 6 
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studying proteins and protein net- 
works—a field known as pro- 
tcomics {sec GEN. September I. 
1997. p. n 

PE officials, who claim thai 
MALDI-TOF (Matrix Assisted 
see AcouemoM, p. to 




ByVkkJ Glaser 

cada Biosciences (Rich- 
mond, CA) last month 
jmnounced its first agree- 
ment with a major pharmaceutical 
company, signing a deal with EU 
UDy (Indianapolis, IN) to use 
Acacias Genome Reporter Matrix 
(GRM) to select and optimize some 
of Ulrys lead atrnpounds. Acacias 
yeast-based system for profiling 
drug activity is useful for evaluating 
the the rapeutic potential of lead 
compounds, and it also has a role in 
the identification and validation of 
new drug targets. 

"We're using the ecosystem of a 
cell to allow us to deduce the mech- 
anism of action and target for any 
chemical " explains Bruce Cohen, 
president and CEO. "We screen for 
every target in a cell simultaneous- 
ly. .using transcription as a readout 



for how a cell is adapting to any 
perturbation," he says. 

The GRM technology consists of 
two main databases: one is the 
genetic response profile, showing 
the effects of mutations in each 
individual yeast gene and compen- 
satory gene regulatory mecha- 
nisms; the other is the chemical 
response profile, which documents 
changes in gene expression in 
response to chemical compounds. 
Computational analysis and partem 
matching between the genetic and 
chemical profiles yields informa- 
tion on the specificity, potency and 
side-effects risk of a drug lead 

Targeting Targets 

No longer is mapping and 
sequencing a gene — or the human 
genome— an end unto itself, but 
SEE TARGET, P. 18 



Sticky Ends 



Avigan received two 
grants from the NIH & 
University of Cali- 
fornia for research 
on gene therapy for 
treatment of cancer & 
HIV infections. . .HRL 
Pharmaceutical Sarvi- 
mi, of Reaton, VA, 
launched the TSH Bug 
Finder, which lo able 
to locate & retrieve 
client - spec i f i ed mi - 
croorganisms in real- 
time. . .Gensla Sieor, 
Inc. will move its 
corporate staff from 
San Diego to Irvine* 
CA, by end of year... 



FDA accepted NDA from 
Sepracor for levalbu- 
terol HC1 inhalation 
solution. . .An $11. 7M 
mezzanine financing 
has been closed by 
Activated Cell Thera- 
py, which changed its 
name to Dandreon Cor- 
poration. . .As tra Afi 
will build major re- 
search facility in 
Waltham, MA, and is 
also relocating Astra 
Arcua research facil- 
ity from Rochester to 
Boston area ... Prolif- 
ic Ltd. team used a 
email peptide to in- 
hibit the E2F protein 
complex and induced 



apoptosis in mammali- 
an tumor cells... Ver- 
tex Pharmaceuticals t 
Inc. and Alpha Thera- 
peutic Corp. ended an 
agreement to develop 
VX-366 for treatment 
of inherited hemoglo- 
bin disorders. . .Havl- 
Cyte received Phase I 
SBIR grant for up to 
$100,000 from NIH for 
development of proto- 
type of its NavlFlow 
technology for high- 
throughput screening 
...Covanca Inc. will 
invest $21 million in 
expansion and renova- 
tion of its facility 
in Indianapolis, IN. 
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merely a means In an end The criti- 
cal next step is to validate the gene 
_and itsjmscjnpcDcluct as a. potential 
drug target The Human Genome 
Project continues to produce a trea- 
sure chest of expressed .sequence 
tags f ESTs) and a tantalizing array of 
complete gene sequences. 

Companies are applying a variety 
of tunctionaJ genomic strategies to 
link genes to specific diseases and to 
mutagenic phenotypes. Yet the ulti- 
mate challenge for pharmaceutical 
companies is to sift through all the 
sequence and difTerential gene 
expression data to identify the best 
targets for drug discovery 

Spinning off technology devel- 
oped at die University of North 
Carolina (Chapel Hill), Cytogen 
Corp. (Princeton. NJ) formed its 
wholly owned subsidiary AxCeTJ 
Biosciences earlier this year. The 
young company is building a protein 
interaction database, cataloging all 
the interactions the modular domains 
of proteins can engage in with a 



range of ligands, in order to gain 
insight into protein function and to 
select the most critical interaction to 
target for drug devdopment 

AxCclls cloning-oMigand-targcL< 
(COLT) technology employs "rccog- 
-nition unite" from Jbe company* 
genetic diversity library (GDU to 
man functional protein interactions 
and quantitatc their affinity. The 
company \ intcr-functional protconv 
ic database (IFP-dbasc) elucidates 
protein interaction networks and 
structure-activity relationships based 
on ligand affinity with protein mod- 
ular domains. 

Defining Disease Pathways 

Signal Phirmacearicah, lac's 

(San Diego. CA) integrated drug tar- 
get and discovery effort is based on 
mapping gene-regulating pathways in 
cells and identifying small molecules 
that regulate the activation of those 
genes. In collaboration with academ- 
ic researchers, the company has iden- 
tified a large number of regulatory 
proteins in several mhogert-actrvated 
protein (MAP) kinase pathways 
(including the JNK, FRK and p38 




The Genome 
Reporter 
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signaling pathways), which Signal is 
evaluating for the treatment of 
autoimmune, inflammatory, cardio- 
vascular and neurologic diseases, and 
cancer. Other target identification 



programs focus on the NF-kB rath- 
way, estrogen-related genes and cen- 
tral/peripheral nervous system genes. 

Regulating cytokine production in 
immune and inflammatory disorders, 




A strong chemical combination to help you grow. And flourish. 

Three hundred million dollars and ten years of hard work. Thai's what it costs to bring \t>ur hiotechnology- 
dermxl tk-nineutic to the marketplace. 
Which means, no room for error. 

Which means, in turn, you'd be wise to tap into the combined capabilities of Mnllinckrodl and J.T.Baker: 
dual sources, trusted names for your chemical raw materials. 

Two separate GMP-produced brands offering the control of a single quality system and the comrnience of a 
single audit process. 
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and modifying bone metabolism to 
treat osteoporosis arc the focus of 
Signals collaboration with Tinabe 
Seryaku (Osaka, Japan*. Signal has 
partnered with Orgaaon/Akzo 
Nobel (Netherlands) to identify 
estrogen-responsive genes as targets 
for treating neurod ege na au ve and 
psychiatric diseases, atherosclerosis 
and ischemia, and with Roche 
Bioscience (Pnlo Alto. CA) to devel- 
op human peripheral nerve cell lines 
for the discovery of treatments for 
pain and incontinence. 

Eicthb' (S. San Francisco, CA) 
strategy for target selection is to 
define disease pathways and identify 
regulatory molecules that activate or 
inhibit those biochemical/genetic 
pathways. Based on the Finding that 
these pathways are conserved across 
species, the company is studying the 
model genetic systems of Drosophila 
and Caenorhabditis eleganx. Using 
its PathFinder technology, Exdixis 
systematically introduces mutations 
into the genomes of these model 
organisms, looking for mutations 
that enhance or suppress the target 
disease-related gene. These novel 
genes then become the basis of drug 
screening assays. 

Cadus Pharmaceutical Corp, 
(Tarrytcwn, NY) is identifying sur- 
rogate ligands to newly discovered 
orphan G-protein coupled trans- 
membrane receptors of unknown 
function to determine the suitability 
of the receptors as drug targets. 
Inserting the novel receptor in a 
yeast system yields a ligand that 
activates the receptor. Access to a 
surrogate ligand allows the company 
to screen for receptor antagonists in 
the yeast system. 

"The antagonist plus the surro- 
gate ligand gives you two probes— 
an on probe and an off probe — 
which allows you to look at func- 
tion." explains David Vfcbb, Ph.D.. 
vp of research and chief scientific 
officer. A surrogate ligand also pro- 
vides information on which G-pro- 
tein interacts with the orphan recep- 
tor and its associated signaling path- 
ways, further clarifying the role of 
the receptor as a potential drug tar- 
get. Cadus' collaboration with 
Smith Kline (Philadelphia) capital- 
izes on Cadus' ability to determine 
orphan receptor function, applying 
the technology to SmtthtClinc s pro- 
prietary, newly discovered G- pro- 
tein receptors. 

Cadus' recombinant yeast system 
can also be used to screen cell and 
tissue extracts for natural ligands, 
ami the company is accelerating its 
internal drug-discovery efforts in the 
areas of cancer, inflammation and 
allergy. A recent equity investment in 
Axiom Biotechnologies (San Diego, 
CA ) gave Cadus a license to Axiom s 
high-throughput pharmacologic 
screening system for lead optimiza- 
tion and discovery. 

As its name implies, 
I ;ene/Networtcs (Alameda. CA) 
1 "ocuscs on identifying gene networks 
thai contribute to mucigenic pheno- 
types and complex disease process- 
es. The integration of mouse and 
human genetic studies forms the 
basis of the technology. The Genome 
Tagged Mice database in develop- 
ment will serve as a library of natur- 
al mouse genetic and phenotypic 
variation. Disease-related c enes 
identified in mice are then evaluated 
in human family- and population- 
based studies to confirm their clini- 
cal relevance and linkages to patho- 
physiologic tniiis. 



Blocking Gene Expression 
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Inactivating a gene known to be 
expressed in association with a par- 
ticular disease is one approach to 
identifying appropriate therapeutic 
targets. The target validation and dis- 
covery program at Rlbozymc 
Pharmaceuticals, Inc. (Boulder. 
C'( M applies the company's ribozymc 
technology to ichieve selective inhi- 
bition of gem- expression in cell cul- 
lua* ami in animals. 

C'nra-laiion of the gene expres- 
sion inhibition with phenntype can 
SEE TARGET, P. 38 
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suggest the relative importance of 
-J**, gene in disease pathology. The. 
company s nuclcase-resistant 
ribozymes form the basis of a col- 
laboration with Scherfog AG 
(Germany) for drug target validation 
and the development of ribozyme- 
based therapeutic agents, and with 
Chiron Corp, (Emeryville, CA) for 
target validation. 

With several antisense compounds 
now progressing through clinical tri- 
als, the concept of using oligonu- 
cleotides to inhibit gene activity is 
not new. But rather than focusing on 
therapeutics development, SeqnJtttr, 
Inc. (Natick, MA) is creating anti- 
sense compounds for the purpose of 
determining gene function and vali- 
dating drug targets. Clients typically 
provide the one-year-old company 
with the sequence (or EST) of a 
potential gene target and. in return, 
Scauitur custom designs a series of 
three to six antisense compounds that 
yield a three-to-ten-fold inhibition of 
the target gene in cell culture. The 
company also provides oligofectins, 
a series of canonic lipids, to deliver 
the oligonucleotides to a variety of 
cultured cells. 

"Differential expression informa- 
tion is just for correlation, it doesn't 
tell function or confirm what would 
be a good target," says Tod Woo if. 
PhJD., director of technology devel- 
opment at Sequitur. Whereas, anti- 
sense compounds will inhibit a tar- 
get Sequitur offers both phospho- 
rothioate DNA antisense com- 
pounds, and its proprietary Next 
Generation chimeric oligonu- 
cleotides, which have a higher 
hybridization affinity, greater speci- 
ficity and reduced toxicity, according 
to the company. 

Mining Pathogen Genomes 

Companies such as Human 
Gemme Sciences (HGS; Rockvillc, 
MD). locyte (Palo Alto. CA), 



Pangea 



AxCell Biosciences scientists say their technology enables the nana ana 
simple functional identifuxtim of tlie t*o essential 
& fP™ton interaction networks; specific recognition units that bind distinct 
modular protein domains are identified and isolated using a combination 
structural/fioKtiond that uses both peptide phase display Genetic 

Diversity Libraries (GDI) and biotnformatics, and cloning of Ligand 
Targets (COLT) technology utilizes recognition units as Junctional probes to 
isolate families oftmemctor proteins. 



Millennium Pfaarmaceutkab Inc. 
(Cambridge. MA) and Genome 
Therapeutics (Waltham, MA) are 
relying on high-speed DNA sequenc- 
ing, positional cloning and other 
strategies to identify specific micro- 
bial genomic sites that would be 
good targets for infectious disease 
therapeutics. 

HGS recently completed sequenc- 
ing of the bacterial pathogen 
Streptococcus pneumoniae, which is 
the focus of an agreement with 
Hoffmann-La Roche (Basel, 
Switzerland). Roche will use the 
sequence data to develop new anti- 
tnfectives against S. pneumoniae. 
HGS and Roche have expanded their 
collaboration to include a nonexclu- 
sive license to access sequence infor- 
mation for the intestinal bacterium 
Enterococcus faecalis. 

lncyte Pharmaceuticals has com- 
pleted one-fold coverage of the 
Candida albicaits genome, identify- 



ing 60% of the genes of this fungal 
pathogen. This genome will become 
part of the company's PathoSeq 
microbial database, lncyte recently 
introduced the ZooSeq animaJ gene 
sequence and expression database. 
The database will provide genomic 
information across various species 
commonly used in preclinical drug 
testing, which may help to better 
define potential drug targets. 

Millennium Pharmaceuticals con- 
tinues to report success in identifying 
novel drug targets, having recently 
discovered a novel chemolone called 
neurotactin and a new class of MAD- 
related proteins that inhibit trans- 
forming growth factor beta (TGF-B) 
signaling. The company also 
received U.S. patent coverage for the 
tub genes, believed to play a role in 
obesity, and for the gene that encodes 
the protein melastatin, which appears 
to suppress metastasis in malignanl 
melanoma. ■ 




HIGH SPECIFIC ACTIVITY 
MICROBIAL ALKALINE 
PHOSPHATASE 
from Biocatalysts 

Biocatalysts Limited, the British speciality enzyme 
company, has developed a completely new type of 
alkaline phosphatase with many advantages over the 
types most commonly used. 
It is of microbial origin with a high specific activity 
(unlike that from E coB) and with higher temperature and 
storage stability compared to that from caff intestine. 
This is the first of several new generation diagnostic 
enzymes being developed by Biocatalysts Limited with 
greatly improved stability. 

• Non-animal source, no risk of BSE or animal 
virus contamination 

• Higher temperature stability than calf Intestine 

• Much higher specific activity than from E. cell 

• Very high storage stability even In the absence 
of glycerol 

for further details on alkaline phosphatase and our other 
diagnostic enzymes contact us direct at the address below or 
within North America contact our US Distributor KattrothPettibom 
'phone: 630350 11 16 or tax: 630-350-1606 
Biocatalysts limited 

Trtfortst tedostrtal Estata Poatfprttd Walts OK CF17SU0 
Teh +44 (0)144* S4S7U Fas +44 (0)1441 S41214 
e^n-Iftt)y@Blocatalrstsxa«. 



Smith, now a computer program- 
mer, is an expert in systems integra- 
tion, Internet technologies and the 
application of industrial engineering 
principles to the drug discovery 
process. Before co-founding Pangea, 
ne was the manager of software 
development at Attorneys Briefcase, 
a legal research software co m pany. 

By being "in the trenches** with 
customers and collaborators, 
Bellenson and Smith sensed the 
frustration of pharmaceutical 
researchers whose incompatible 
tools have impeded their progress. 
According to Bellenson, "Most of 
them are geared toward analyzing 
one molecule at a time. Its tike emp- 
tying the ocean with an eye drop- 
per—an incompatible eye dropper at 
that. A pharmaceutical company 
may have JO different drug discov- 
ery teams with various approaches. 
The problem is to manage the 
process of experimenting with a lot 
of different approaches, to automate 
while maintaining flexibility." 

GeneWortd 2.1 enables "integra- 
tion of the entire target discovery and 
validation process,** Bellenson says. 
The commercial software package 
coordinates the entire process of 
sequence-data analysis and can be 
integrated with other programs and 
databases, according to Smith, who 
adds that it handles thousands of 
sequence results, organizes and auto- 
mates annotation and seamlessly 
interacts with growing genome data- 
bases. Simple forms and menus 
enable users to turn raw sequence 
data into crucial knowledge for drug 
discovery by applying algorithms to 
sequences, creating custom analysis 
strategies and producing useful 
reports, without the need for writing 
computer code. Gene World 2.1 runs 
on a variety of platforms and operat- 
ing systems. 

Pairing industrial relational data- 
base-management systems with a 
web-browser interface, Pangeas 
Operating^ System of Drug 
Discovery" is an open-computing 
framework that allows client/server 
and Java-enabled web-based tech- 
nologies to collect, organize and ana- 
lyze drug discovery information for 
pharmaceutical companies to simpli- 
fy and accelerate drug discovery. The 
technology unites automated 
genomics database analysis for drug 
target site selection, chemical infor- 
mation database analysis and large- 
scale combinatorial chemistry pro- 
ject management and high-through- 
put screening project management 
for drug lead efficacy analysis. 
Pangea officials maintain that these 
integrated elements provide a unified 
environment for chemists, biologists 
and others involved in the drug dis- 
covery process to work together with 
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Sioffrformitictsti un design and 
Strategies, such as the one 
shown hen. that forward data 
through mutttpte-slep analyses 
logically and automatically. 
Researchers throughout your 
organization can apply the tame 
Strategies to theit own data. 



commercial and public domain 
software. 

Pangeas Operating System of 
Drug Discovery can accommodate 
Sybase, Oracle or Informix relation- 
al database-management systems 
and any version of UNIX. It absorbs 
new data formats, rfatabasrs, algo- 
rithms and analysis paradigms into 
the automated workflow without 
software modifications. Netscape 
Navigator 1 " provides a friendly user 
interface from PC, Macintosh, and 
UNIX workstations. 

In the near term, Pangea plans to 
complete its btoinformatics core 
with two more programs. Gene 
Foundry, a sample tracking and 
workflow sequence package for 
DNA sequence and fragment infor- 
mation, will also offer interaction 
with robots, reagent tracking and 
troubleshooting. Gene Thesaurus, 
the other package is a "warehouse 
of bioinformatics data,** says 
Bellenson ■ 
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GTAC Chairman, Professor 
Norman C. Nevin, said 1996 saw 
"four important developments": an 
increase in enquiries and submis- 
sions made to GTAC; an increase in 
the complexity of submitted proto- 
cols; a continuing shift from gene 
therapy for single-gene disorders 
toward strategies aimed at tumour 
, destruction in cancer; and a growth 
in international sponsorship of UK. 
1 gene therapy trials. 

Since 1993. GTAC and its prede- 
. cesser, the Clothier Commiuce, have 
approved 1 8 UK. gene therapy clini- 
I cal trials (13 of which have been car- 
i ricd out), which are listed in the 
, report. The disease areas targeted fay 
| these trials include severe combined 
immunodeficiency (1 trial), cystic 
fibrosis <6X metastatic melanoma (2% 
rymphorna (2), neuroblastoma (1), 
| breast cancer (IX Hurler* syndrome 
f 1 ). cervical cancer ( I ), glkJbtastoma 



breast cancer, breast cancer with liver 
metastases, glioblastoma, malignant 
ascites due to gastrointestinal cancer 
and ovarian cancer. 

Copies of the GTAC thrid annual 
report are available from the GTAC 
Secretariat, Wellington House. 133- 
155 Waterloo Road. London SEI 
8UG, UK. 

Coated Lenses Prevent PCO 

Scientists in the UK. say it may be 
possible to prevent posterior capsule 
opacification (PCO), a common 
complication following cataract 
surgery, by using the irnptanted poly- 
methylmethacrylate (PMMA) 
intraocular lens as a drug delivery 
system. PCO occurs in 30-50% of 
cataract surgery patients as a result of 
stimulated cell growth within the 
remaining capsular bag. The condi- 
tion causes a decline in visual acuity 
and requires expensive laser treat- 
ment, thus negating the routine use of 
cataract surgery in underdeveloped 
countries, explains G. Duncan, at the 
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Exploring the Metabolic and Genetic Control of 
Gene Expression on a Genomic Scale 

Joseph L DeRisi, Vishwanath R. Iyer, Patrick O. Brown* 

DNA microarrays containing virtually every gene of Saccharomyces cerevisiae were used 
to carry out a comprehensive investigation of the temporal program of gene expression 
accompanying the metabolic shift from fermentation to respiration. The expression 
profiles observed for genes with known metabolic functions pointed to features of the 
metabolic reprogramming that occur during the diauxic shift, and the expression patterns 
of many previously uncharacterized genes provided clues to their possible functions. The 
same DNA microarrays were also used to identify genes whose expression was affected 
by deletion of the transcriptional co-repressor TUP1 or overexpression of the transcrip- 
tional activator YAP1. These results demonstrate the feasibility and utility of this ap- 
proach to genomewide exploration of gene expression patterns. 



Th e complete sequences of nearly a dozen 
microbial genomes are known, and in the 
next several years we expect to know the 
complete genome sequences of several 
metazoans, including the human genome. 
Defining the role of each gene in these 
genomes will be a formidable task, and un- 
derstanding how the genome functions as a 
whole in the complex nacurai history of a 
living organism presents an even greater 
challenge. 

Knowing when and where a gene is 
expressed often provides a strong clue as to 
its biological role. Conversely, the pattern 
of genes expressed in a cell can provide 
detailed information about its state. Al- 
though regulation of protein abundance in 
a cell is by no means accomplished solely 
by regulation of mRNA, virtually all dif- 
ferences in cell type or state are correlated 
with changes in the mRNA levels of many 
genes. This is fortuitous because the only 
specific reagent required to measure the 
abundance of the mRNA for a specific 
gene is a cDNA sequence. DNA microar- 
rays, consisting of thousands of individual 
gene sequences printed in a high-density 
array on a glass microscope slide (1, 2), 
provide a practical and economical tool 
for studying gene expression on a very 
large scale (3-6). 

Saccharomyces cerevisiae is an especially 
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favorable organism in which to conduct a 
systematic investigation of gene expression. 
The genes are easy to recognize in the ge- 
nome sequence, cis regulatory elements are 
generally compact and close to the tran- 
scription units, much is already known 
about its genetic regulatory mechanisms, 
and a powerful set of tools is available for its 
analysis. 

A recurring cycle in the natural history 
of yeast involves a shift from anaerobic 
(fermentation) to aerobic (respiration) me- 
tabolism. Inoculation of yeast into a medi- 
um rich in sugar is followed by rapid growth 
fueled by fermentation, with the production 
of ethanol. When the fermentable sugar is 
exhausted, the yeast cells turn to ethanol as 
a carbon source for aerobic growth. This 
switch from anaerobic growth to aerobic 
respiration upon depletion of glucose, re- 
ferred to as the diauxic shift, is correlated 
with widespread changes in the expression 
of genes involved in fundamental cellular 
processes such as carbon metabolism, pro- 
tein synthesis, and carbohydrate storage 
(7). We used DNA microarrays to charac- 
terize the changes in gene expression that 
take place during this process for nearly the 
entire genome, and to investigate the ge- 
netic circuitry that regulates and executes 
this program. 

Yeast open reading frames (ORFs) were 
amplified by the polymerase chain reaction 
(PCR), with a commercially available set of 
primer pairs (8). DNA microarrays, con- 
taining approximately 6400 distinct DNA 
sequences, were printed onto glass slides by 



using a simple robotic printing device (9). 
Cells from an exponentially growing culture 
of yeast were inoculated into fresh medium 
and grown at 30°C for 21 hours. After an 
initial 9 hours of growth, samples were har- 
vested at seven successive 2-hour intervals, 
and mRNA was isolated (10). Fluorescently 
labeled cDN A was prepared by reverse tran- 
scription in the presence of Cy3(green)- 
or Cy5( red) -labeled deoxyuridine triphos- 
phate (dUTP) (11) and then hybridized to 
the microarrays (12}. To maximize the re- 
liability with which changes in expression 
levels could be discerned, we labeled cDNA 
prepared from cells at each successive time 
point with Cy5, then mixed it with a Cy3- 
labeled "reference" cDNA sample prepared 
from cells harvested at the first interval 
after inoculation. In this experimental de- 
sign, the relative fluorescence intensity 
measured for the Cy3 and Cy5 fluors at 
each array element provides a reliable mea- 
sure of the relative abundance of the corre- 
sponding mRNA in the two cell popula- 
tions (Fig. 1). Data from the series of seven 
samples (Fig. 2), consisting of more than 
43,000 expression- ratio measurements, 
were organized into a database to facilitate 
efficient exploration and analysis of the 
results. This database is publicly available 
on the Internet (13). 

During exponential growth in glucose- 
rich medium, the global pattern of gene 
expression was remarkably stable. Indeed, 
when gene expression patterns between the 
first two cell samples (harvested at a 2-hour 
interval) were compared, mRNA levels dif- 
fered by a factor of 2 or more for only 19 
genes (0.3%), and the largest of these dif- 
ferences was only 2.7-fold (14). However, as 
glucose was progressively depleted from the 
growth media during the course of the ex- 
periment, a marked change was seen in the 
global pattern of gene expression. mRNA 
levels for approximately 710 genes were 
induced by a factor of at least 2, and the 
mRNA levels for approximately 1030 genes 
declined by a factor of at least 2. Messenger 
RNA levels for 183 genes increased by a 
factor of at least 4, and mRNA levels for 
203 genes diminished by a factor of at least 
4- About half of these differentially ex- 
pressed genes have no currently recognized 
function and are not yet named. Indeed, 
more than 400 of the differentially ex- 
pressed genes have no apparent homology 
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- - -to-any-gene whose function is known~"(75). 
The responses of these previously unchar- 
acterized genes to the diauxic shift therefore 
provides the first small clue to their possible 
roles. 

The global view of changes in expres- 
sion of genes with known functions pro- 
vides a vivid picture of the way in which 
the cell adapts to a changing environ- 
ment. Figure 3 shows a portion of the yeast 
metabolic pathways involved in carbon 
and energy metabolism. Mapping the 
changes we observed in the mRNAs en- 
coding each enzyme onto this framework 
allowed us to infer the redirection in the 
flow of metabolites through this system. 
We observed large inductions of the genes 
coding for the enzymes aldehyde dehydro- 
genase (ALD2) and acetyl-coenzyme 
A(CoA) synthase (ACSl), which func- 
tion together to convert the products of 
alcohol dehydrogenase into acetyl-CoA, 
which in turn is used to fuel the tricarbox- 
ylic acid (TCA) cycle and the glyoxylate 
cycle. The concomitant shutdown of tran- 
scription of the genes encoding pyruvate 
decarboxylase and induction of pyruvate 
carboxylase rechannels pyruvate away 
from acetaldehyde, and instead to oxalac- 
etate, where it can serve to supply the 
TCA cycle and gluconeogenesis. Induc- 
tion of the pivotal genes PCKl, encoding 
phosphoenolpyruvate carboxykinase, and 
FBP1, encoding fructose 1,6-biphos- 
phatase, switches the directions of two key 
irreversible steps in glycolysis, reversing 
the flow of metabolites along the revers- 
ible steps of the glycolytic pathway toward 
the essential biosynthetic precursor, glu- 
coses-phosphate. Induction of the genes 
coding for the trehalose synthase and gly- 
cogen synthase complexes promotes chan- 
neling of glucose-6-phosphate into these 
carbohydrate storage pathways. 

Just as the changes in expression of 
genes encoding pivotal enzymes can pro- 
vide insight into metabolic reprogram- 
rning, the behavior of large groups of func- 
tionally related genes can provide a broad 
view of the systematic way in which the 
yeast cell adapts to a changing environ- 
ment (Fig. 4). Several classes of genes, 
such as cytochrome c-related genes and 
those involved in the TCA/glyoxylate cy- 
cle and carbohydrate storage, were coordi- 
nately induced by glucose exhaustion. In 
contrast, genes devoted to protein synthe- 
sis, including ribosoma! proteins, tRNA 
synthetases, and translation, elongation, 
and initiation factors, exhibited a coordi- 
nated decrease in expression. More than 
95% of ribosomal genes showed at least 
twofold decreases in expression during the 
diauxic shift (Fig. 4) (13). A noteworthy 
and illuminating exception was that the 



genes encoding mitochondrial ribosomal 
genes were generally induced rather than 
repressed after glucose limitation, high- 
lighting the requirement for mitchondrial 
biogenesis (13). As more is learned about 
the functions of every gene in the yeast 
genome, the ability to gain insight into a 
cell's response to a changing environment 
through its global gene expression patterns 
will become increasingly powerful. 

Several distinct temporal patterns of ex- 
pression could be recognized, and sets of 
genes could be grouped on the basis of the 
similarities in their expression patterns. The 
characterized members of each of these 
groups also shared important similarities in 
their functions. Moreover, in most cases, 
common regulatory mechanisms could be' 
inferred for sets of genes with similar expres- 
sion profiles. For example, seven genes 
showed a late induction profile, with mRNA 
levels increasing by more than ninefold at 
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thejast timepoint but less than threefold at 
the preceding timepoint (Fig. 5B). All of 
these genes were known to be glucose-re- 
pressed, and five of the seven were previously 
noted to share a common upstream activat- 
ing sequence (UAS), the carbon source re- 
sponse element (CSRE) (16-20). A search 
tn the promoter regions of the remaining two 
genes, ACR1 and IDP2, revealed that 
ACKi, a gene essential for ACS! activity, 
also possessed a consensus CSRE motif, but 
interestingly, 1DP2 did not. A search of the 
entire yeast genome sequence for the con- 
sensus CSRE motif revealed only four addi- 
tional candidate genes, none of which 
showed a similar induction. 

Examples from additional groups of 
genes that shared expression profiles are 
illustrated in Fig. 5, C through F. The 
sequences upstream of the named genes in 
rig. 5C all contain stress response ele- 
ments (STRE), and with the exception 
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downstream genes at the diauxicTshift. 

Although most of the transcriptional 
responses that we observed were not pre- 
viously known, the responses of many 
genes during the diauxic shift have been 
described. Comparison of the results we 
obtained by DNA microarray hybridiza- 
tion with previously reported results there- 
fore provided a strong test of the sensitiv- 
ity and accuracy of this approach. The 
expression patterns we observed for previ- 
ously characterized genes showed almost 
perfect concordance with previously pub- 
lished results (36). Moreover, the differ- 
ential expression measurements obtained 
by DNA microarray hybridization were re- 
producible in duplicate experiments. For 
example, the remarkable changes in gene 
expression between cells harvested imme- 
diately after inoculation and immediately 
after the diauxic shift (the first and sixth 
intervals in this time series) were mea- 
sured in duplicate, independent DNA mi- 
croarray hybridizations. The correlation 
coefficient for two complete sets of expres- 
sion ratio measurements was 0.87, and for 
more than 95% of the genes, the expres- 
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of H5P42rhave previously been shown** to 
be controlled at least in part by these 
elements (21-24). Inspection of the se- 
quences upstream of HSP42 and the two 
uncharacterized genes shown in Fig. 5C, 
YKL026c, a hypothetical protein with 
similarity to glutathione peroxidase, and 
YGR043c, a putative transaldolase, re- 
vealed that each of these genes also pos- 
sess repeated upstream copies of the stress- 
responsive CCCCT motif. Of the 13 ad- 
ditional genes in the yeast genome that 
shared this expression profile [including 
HSP30, ALD2, OM45, and 10 uncharac- 
terized ORFs (25)], nine contained one or 
more recognizable STRE sites in their up- 
stream regions. 

The heterotrimeric transcriptional acti- 
vator complex HAP2,3,4 has been shown 
to be responsible for induction of several 
genes important for respiration (26-28). 
This complex binds a degenerate consensus 
sequence known as the CCAAT box (26). 
Computer analysis, using the consensus se- 
quence TNRYTGGB (29), has suggested 
that a large number of genes involved in 
respiration may be specific targets of 
HAP2,3,4 (30). Indeed, a putative 
HAP2,3,4 binding site could be found in 
the sequences upstream of each of the seven 
cytochrome c-related genes that showed 
the greatest magnitude of induction (Fig. 
5 D). Of 12 additional cytochrome c-related 
genes that were induced, HAP2,3,4 binding 
sites were present in all but one. Signifi- 
cantly, we found that transcription of 
HAP4 itself was induced nearly ninefold 
concomitant with the diauxic shift. 

Control of ribosomal protein biogenesis 
is mainly exerted at the transcriptional 
level, through the presence of a common 
upstream-activating element (UAS ) 
that is recognized by the Rapl DNA-binJ- 
ing protein (31, 32). The expression pro- 
files of seven ribosomal proteins are shown 
in Fig. 5F. A search of the sequences 
upstream of all seven genes revealed con- 
sensus Rapl-binding motifs (33). It has 
been suggested that declining Rapl levels 
in the cell during starvation may be re- 
sponsible for the decline in ribosomal pro- 
tein gene expression (34). Indeed, we ob- 
served that the abundance of RAPl 
mRNA diminished by 4.4-fold, at about 
the time of glucose exhaustion. 

Of the 149 genes that encode known or 
putative transcription factors, only two 
HAP4 and S/P4 , were induced by a factor of 
more than threefold at the diauxic shift. 
S/P4 encodes a DNA-binding transcrip- 
tional activator that has been shown to 
interact with Snfl , the "master regulator" of 
glucose repression (35). The eightfold in- 
duction of S/P4 upon depletion of glucose 
strongly suggests a role in the induction of 



sionjatios measured in these duplicate 
experiments differed by less than a factor 
of 2. However, in a few cases, there were 
discrepancies between our results and pre- 
vious results, pointing to technical limita- 
tions that will need to be addressed as 
DNA microarray technology advances 
(37 , 38). Despite the noted exceptions, 
the high concordance between the results 
we obtained in these experiments and 
those of previous studies provides confi- 
dence in the reliability and thoroughness 
of the survey. 

The changes in gene expression during 
this diauxic shift are complex and involve 
integration of many kinds of information 
about the nutritional and metabolic state 
of the cell. The large number of genes 
whose expression is altered and the diver- 
sity of temporal expression profiles ob- 
served in this experiment highlight the 
challenge of understanding the underlying 
regulatory mechanisms. One approach to 
defining the contributions of individual 
regulatory genes to a complex program of "~ 
this kind is to use DNA microarrays to 
identify genes whose expression is affected 



Fig. 2. The section of the ar- 
ray indicated by the gray box 
in Fig. 1 is shown for each of 
the experiments described 
here. Representative genes 
are labeled. In each of the ar- 
rays used to analyze gene 
expression during the diauxic 
shift, red spots represent 
genes that were induced rel- 
ative to the initial timepoint, 
and green spots represent 
genes that were repressed 
relative to the initial timepoint. 
In the arrays used to analyze 
the effects of the tuplb mu- 
tation and YAPl overexpres- 
sion, red spots represent 
genes whose expression was 
increased, and green spots 
represent genes whose ex- 
pression was decreased by 
the genetic modification. Note 
that distinct sets of genes are 
induced and repressed in the 
different experiments. The 
complete images of each of 
these arrays can be viewed on 
the Internet (13). Cell density 
as measured by optical densi- 
ty (OD) at 600 nm was used to 
measure the growth of the 
culture. 
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By mutations in each putative regulatory 
gene. As a test of this strategy, we analyzed 
the genomewide changes in gene expression 
that result from deletion of the TUPl gene. 
Transcriptional repression of many genes by 
glucose requires the DNA-binding repressor 



Migl and is mediated by recruiting the tran- 
scriptional co-repressors Tupl and Cyc8/ 
Ssn6 (39). Tupl has also been implicated in 
repression of oxygen-regulated, mating-type- 
specific, and DNA-damage-inducible genes 
(40). 
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72^jaiycoger>*«-y* 




Fig. 3. Metabolic reprogramming inferred from globaJ analysis of changes in gene expression Onty kev 
metabolic intermediates are identified. The yeast genes encoding the enzymes that catalyze each step 
rn this metabolic circuit are identified by name in the boxes. The genes encoding succinyl-CoA synthase 
and glycogen-debranching enzyme have not been explicitly identified, but the ORFs YGR244 and 
YPR184 show significant homology to known succinyl-CoA synthase and glycogen-debranching en- 
zymes respectively, and are therefore included in the corresponding steps in this figure. Red boxes with 
white lettering identify genes whose expression increases in the diauxic shift. Green boxes with dark 
green lettering identify genes whose expression diminishes in the diauxic shift. The magnitude of 
induction or repression is indicated for these genes. For multimeric enzyme complexes such as 
succinate dehydrogenase, the indicated fold-induction represents an unweighted average of all the 
genes listed in the box. Black and white boxes indicate no significant differential expression {less than 
hvofoW). The Erection of the arrows connecting reversible enzymatic steps indicate the direction of the 
flow of metabolic intermediates, inferred from the gene expression pattern, after the diauxic shift Arrows 
representing steps catalyzed by genes whose expression was strongly induced are highlighted in red 
The broad gray arrows represent major increases in the fbw of metabolites after the diauxic shift" 
inferred from the indicated changes in gene expression. 
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Wild-type yeast cells and cells bearing 
a deletion of the TUPl gene (tupl A) were 
grown in parallel cultures in rich methum 
containing glucose as the carbon source. 
Messenger RNA was isolated from expo- 
nentially growing cells from the two pop- 
ulations and used to prepare cDNA la- 
beled with Cy3 (green) and Cy5 (red), 
respectively (J /). The labeled probes were 
mixed and simultaneously hybridized to 
the microarray. Red spots on the microar- 
ray therefore represented genes whose 
transcription was induced in the tup] A 
strain, and thus presumably repressed by 
Tupl (41 ). A representative section of the 
microarray (Fig. 2, bottom middle panel) 
illustrates that the genes whose expression 
was affected by the tupl A mutation, were, 
in general, distinct from those induced 
upon glucose exhaustion [complete images 
of all the arrays shown in Fig. 2 are avail- 
able on the Internet (13)}. Nevertheless, 
34 (10%) of the genes that were induced 
by a factor of at least 2 after the diauxic 
shift were similarly induced by deletion of 
TUPl , suggesting that these genes may be 
subject to TUPi-mediated repression by 
glucose. For example, SUC2, the gene en- 
coding invertase, and all five hexose trans- 
porter genes that were induced during the 
course of the diauxic shift were similarly 
induced, in duplicate experiments, by the 
deletion of TUPL 

The set of genes affected by Tupl in this 
experiment also included ct-glucosidases, 
the mating-type-specific genes MFAi and 
MFA2, and the DNA damage-inducible 
RNR2 and RNR4, as well as genes involved 
in flocculation and many genes of unknown 
function. The hybridization signal corre- 
sponding to expression of TUPl itself was 
also severely reduced because of the (in- 
complete) deletion of the transcription unit 
in the tupl A strain, providing a positive 
control in the experiment (42). 

Many of the transcriptional targets of 
Tupl fell into sets of genes with related 
biochemical functions. For instance, al- 
though only about 3% of all yeast genes 
appeared to be TUP /-repressed by a factor 
of more than 2 in duplicate experiments 
under these conditions, 6 of the 13 genes 
that have been implicated in flocculation 
(15) showed a reproducible increase in 
expression of at least twofold when TUPl 
was deleted. Another group of related 
genes that appeared to be subject to TUPl 
repression encodes the serine-rich cell 
wall mannoproteins, such as Tipl and 
Tirl/Srpl which are induced by cold 
shock and other stresses (43), and similar, 
serine-poor proteins, the seripauperins 
(44). Messenger RNA levels for 23 of the 
26 genes in this group were reproducibly 
elevated by at least 2.5-fold in the tuplA 
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-- sEram, and 18 of these genes* were induced 
by more than sevenfold when TUP1 was 
deleted. In contrast, none of 83 genes that 
could be classified as putative regulators of 
the cell division cycle were induced more 
than twofold by deletion of TUPl. Thus, 
despite the diversity of the regulatory sys- 
tems that employ Tupl, most of the genes 
that it regulates under these conditions 
fall into a limited number of distinct func- 
tional classes. 

Because the microarray allows us to 
monitor expression of nearly every gene in 
yeast, we can, in principle, use this ap- 
proach to identify all the transcriptional 
targets of a regulatory protein like Tupl. It 
is important to note, however, that in any 
single experiment of this kind we can only 
recognize those target genes that are nor- 
mally repressed (or induced) under the 
conditions of the experiment. For in- 
stance, the experiment described here an- 
alyzed a MAT a strain in which MFAl 
and MFA2, the genes encoding the a- 
factor mating pheromone precursor, are 
normally repressed. In the isogenic tupl A 
strain, these genes were inappropriately 
expressed, reflecting the role that Tupl 
plays in their repression. Had we instead 
carried out this experiment with a MATA 
strain (in which expression of MFAl and 
MFA2 is not repressed), it would not have 
been possible to conclude anything re- 
garding the role of Tupl in the repression 
of these genes. Conversely, we cannot dis- 
tinguish indirect effects of the chronic 
absence of Tupl in the mutant strain from 
effects directly attributable to its partici- 
pation in repressing the transcription of a 
gene. 

Another simple route to modulating the 
activity of a regulatory factor is to overex- 
press the gene that encodes it. YAP I en- 
codes a DNA-binding transcription factor 
belonging to the b-zip class of DNA-bind- 
ing proteins. Overexpression of YAPI in 
yeast confers increased resistance to hydro- 
gen peroxide, o-phenanthroline, heavy 
metals, and osmotic stress (45). We ana- 
lyzed differential gene expression between a 
wild-type strain bearing a control plasmid 
and a strain with a plasmid expressing YAP I 
under the control of the strong GAL1-10 
promoter, both grown in galactose (that is, 
a condition that induces YAPI overexpres- 
sion). Complementary DNA from the con- 
trol and YAP! overexpressing strains, la- 
beled with Cy3 and Cy5, respectively, was 
prepared from mRNA isolated from the two 
strains and hybridized to the microarray. 
Thus, red spots on the array represent genes 
that were induced in the strain overexpress- 
ing YAP L 

Of the 17 genes whose mRNA levels 
increased by more than threefold when 



YAP J was overexpressed in this way, five 
bear homology to aryl-alcohol oxidoreduc- 
tases (Fig. 2 and Table 1). An additional 
four of the genes in this set also belong to 
the general class of dehydrogenases/oxi- 
doreductases. Very little is known about 
the role of aryl-alcohol oxidoreductases in 
S. cerevisiae, but these enzymes have been 
isolated from ligninolytic fungi, in which 
they participate in coupled redox reac- 
tions, oxidizing aromatic, and aliphatic 
unsaturated alcohols to aldehydes with the 
production of hydrogen peroxide (46, 47). 
The fact that a remarkable fraction of the 
targets identified in this experiment be- 
long to the same small, functional group of 
oxidoreductases suggests that these genes 

Fig. 4. Coordinated reg- 
ulation of functionalfy re- 
lated genes. The curves 
represent the average in- 
duction or repression ra- 
tios for all the genes in 
each indicated group. 
The total number of 
genes in each group was 
as follows: ribosomal 
proteins, 112; translation 
elongation and initiation 

factors, 25; tRNA synthetases (excluolng mitochondial synthetases), 17; glycogen and trehalose svn- 
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mieh i_ play an im P ort anc protective role 
during oxidative stress. Transcription of a 
small number of genes was reduced in the 
strain overexpressing Yapl. Interestingly, 
many of these genes encode sugar per- 
meases or enzymes involved in inositol 
metabolism. 

We searched for Yapl -binding sites 
(TTACTAA or TGACTAA) in the se- 
quences upstream of the target genes we 
identified (48). About two-thirds of the 
genes that were induced by more than 
threefold upon Yapl overexpression had 
one or more binding sites within 600 bases 
upstream of the start codon (Table 1), sug- 
gesting that they are directly regulated by 
Yapl. The absence of canonical Yapl-bind- 
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YNL331C 

YKL071W 162-222 (5 sites) 
YML007W YAP1 

YFL056C 223, 242 

YLL060C 98 
YOL165C 266 

YCR107W 

YML116W 409 ATR1 

YBR008C 142,167,364 

YCLX08C 
YJR155W 

YPL171C 148,212 OYE3 

YLR460C 167,317 

YKR076W 178 

YHR179W 327 OYE2 

YML131W 507 

YOL126C MDH2 



Putative aryl-alcohol reductase 1 2 9 

Similarity to bacterial csgA protein 1 0.4 

Transcriptional activator involved in 9*8 

oxidative stress response 
Homology to aryl-alcohol 90 

dehydrogenases 
Putative glutathione transferase 7.4 
Putative aryl-alcohol dehydrogenase 7 0 

(NADP+) 

Putative aryl-alcohol reductase 6 5 

Aminotriazole and 4-nitroquinoline 6 5 

resistance protein 
Homology to benomyi/methotrexate 6. 1 

resistance protein 
Hypothetical protein q -j 

Putative aryl-alcohol dehydrogenase 6 0 

NAPDH dehydrogenase (old yellow 5.8 

enzyme), isoform 3 
Homology to hypothetical proteins 4 7 

YCRl02candYNLl34c 
Homology to hypothetical protein 4 5 

YMR251W 

NAD(P)H oxidoreductase (old yellow 4,1 

enzyme), isoform 1 
Similarity to A thafiana zeta-crystallin 3 7 

homolog 

Malate dehydrogenase 3 3 
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- -mg-sites-upstream of the others may reflect 
an ability of Yapl to bind sites that differ 
from the canonical binding sites, perhaps in 
cooperation with other factors, or less like- 
ly, may represent an indirect effect of Yapl 
overexpression, mediated by one or more 
intermediary factors. Yapl sites were found 
only four times in the corresponding region 
of an arbitrary set of 30 genes that were not 
differentially regulated by Yapl. 

Use of a DNA microarray to character- 
ize the transcriptional consequences of 
mutations affecting the activity of regula- 
tory molecules provides a simple and pow- 
erful approach to dissection and character- 
ization of regulatory pathways and net- 




works. This strategy also has an important 
practical application in drug screening. 
Mutations in specific genes encoding can- 
didate drug targets can serve as surrogates 
for the ideal chemical inhibitor or modu- 
lator of their activity. DNA microarrays 
can be used to define the resulting signa- 
ture pattern of alterations in gene expres- 
sion, and then subsequently used in an 
assay to screen for compounds that repro- 
duce the desired signature pattern. 

DNA microarrays provide a simple and 
economical way to explore gene expres- 
sion patterns on a genomic scale. The 
hurdles to extending this approach to any 
other organism are minor. The equipment 
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Reports 



required for fabricating and using DNA 
micToarrays (9) consists of components 
that were chosen for their modest cost and 
simplicity. It was feasible for a small group 
to accomplish the amplification of more 
than 6000 genes in about 4 months and, 
once the amplified gene sequences were in 
hand, only 2 days were required to print a 
set of 110 microarrays of 6400 elements 
each. Probe preparation, hybridization, 
and fluorescent imaging are also simple 
procedures. Even conceptually simple ex- 
periments, as we described here, can yield 
vast amounts of information. The value of 
the information from each experiment of 
this kind will progressively increase as 
more is learned about the functions of 
each gene and as additional experiments 
define the global changes in gene expres- 
sion in diverse other natural processes and 
genetic perturbations. Perhaps the greatest 
challenge now is to develop efficient 
methods for organizing, distributing, inter- 
preting, and extracting insights from the 
large volumes of data these experiments 
will provide. 
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ABSTRACT The recent ability to sequence whole genomes 
allows ready access to all genetic material. The approaches 
outlined here allow automated analysis of sequence for the 
synthesis of optimal primers in an automated multiplex 
oligonucleotide synthesizer (AMOS). The efficiency is such 
that all ORFs for an organism can be amplified by PCR. The 
resulting amplicons can be used directly in the construction of 
DNA arrays or can be cloned for a large variety of functional 
analyses. These tools allow a replacement of single-gene 
analysis with a highly efficient whole-genome analysis. 



The genome sequencing projects have generated and will 
continue to generate enormous amounts of sequence data. The 
genomes of Saccharomyces cerevisiae, Escherichia coli, Hae- 
mophilus influenzae (1), Mycoplasma genitalium (2), and Meth- 
anococcus jannaschii (3) have been completely sequenced. 
Other model organisms have had substantial portions of their 
genomes sequenced as well, including the nematode Caeno- 
rhabditis elegans (4) and the small flowering plant Arabidopsis 
thaliana (5). This massive and increasing amount of sequence 
information allows the development of novel experimental 
approaches to identify gene function. 

One standard use of genome sequence data is to attempt to 
identify the functions of predicted open reading frames 
(ORFs) within the genome by comparison to genes of known 
function. Such a comparative analysis of all ORFs to existing 
sequence data is fast, simple, and requires no experimentation 
and is therefore a reasonable first step. While finding sequence 
homologies/motifs is not a substitute for experimentation, 
noting the presence of sequence homology and/or sequence 
motifs can be a useful first step in finding interesting genes, in 
designing experiments and, in some cases, predicting function. 
However, this type of analysis is frequently uninformative. For 
example, over one-half of new ORFs in S. cerevisiae have no 
known function (6). If this is the case in a well studied organism 
such as yeast, the problem will be even worse in organisms that 
are less well studied or less manipulable. A large, experimen- 
tally determined gene function database would make homol- 
ogy/motif searches much more useful. 

Experimental analysis must be performed to thoroughly 
understand the biological function of a gene product. Scaling 
up from classical "cottage industry" one-gene-oriented ap- 
proaches to whole-genome analysis would be very expensive 
and laborious. It is clear that novel strategies are necessary to 
efficiently pursue the next phase of the genome projects — 
whole-genome experimental analysis to explore gene expres- 
sion, gene product function, and other genome functions. 
Model organisms, such as S. cerevisiae, will be extremely 
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important in the development of novel whole-genome analysis 
techniques and, subsequently, in improving our understanding 
of other more complex and less manipulable organisms. 

The genome sequence can be systematically used as a tool 
to understand ORFs, gene product function, and other ge- 
nome regions. Toward this end, a directed strategy has been 
developed for exploiting sequence information as a means of 
providing information about biological function (Fig. 1). Ef- 
forts have been directed toward the amplification of each 
predicted ORF or any other region of the genome ranging 
from a few base pairs to several kilobase pairs. There are many 
uses for these amplicons — they can be cloned into standard 
vectors or specialized expression vectors, or can be cloned into 
other specialized vectors such as those used for two-hybrid 
analysis. The amplicons can also be used directly by, for 
example, arraying onto glass for expression analysis, for DNA 
binding assays, or for any direct DNA assay (7). As a pilot 
study, synthetic primers were made on the 96-weIl automated 
multiplex oligonucleotide synthesizer (AMOS) instrument (8) 
(Fig. 2). These oligonucleotides were used to amplify each 
ORF on yeast chromosome V. The current version of this 
instrument can synthesize three plates of 96 oligonucleotides 
each (25 bases) in an 8-hr day. The amplification of the entire 
set of PCR products was then analyzed by gel electrophoresis 
(Fig. 3). Successful amplification of the proper length product 
on the first attempt was 95%. This project demonstrates that 
one can go directly from sequence information to biological 
analysis in a truly automated, totally directed manner. 

These amplicons can be incorporated directly in arrays or 
the amplicons can be cloned. If the amplicons are to be cloned, 
novel sequences can be incorporated at the 5' end of the 
oligonucleotide to facilitate cloning. One potential problem 
with cloning PCR products is that the cloned amplicons may 
contain sequence alterations that diminish their utility. One 
option would be to resequence each individual amplicon. 
However, this is expensive, inefficient, and time consuming. A. 
faster, more cost-effective, and more accurate approach is to 
apply comparative sequencing by denaturing HPLC (9). This 
method is capable of detecting a single base change in a 2-kb 
heterodupiex. Longer amplicons can be analyzed by use of 
appropriate restriction fragments. If any change is detected in 
a clone, an alternate clone of the same region can be analyzed. 
Modifying the system to allow high throughput analysis by 
denaturing HPLC is also relatively simple and straightforward. 

If amplicons are used directly on arrays without cloning, it 
is important to note that, even if single PCR product bands are 
observed on gels, the PCR products will be contaminated with 
various amounts of other sequences. This contamination has 
the potential to affect the results in, for example, expression 
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Fig. 1. Overview of systematic method for isolating individual 
genes. Sequence information is obtained automatically from sequence 
databases. The data are input into primer selection software specifi- 
cally designed to target ORFs as designated by database annotations. 
The output file containing the primer information is directly read by 
a high-throughput oligonucleotide synthesizer, which makes the oli- 
gonucleotides in 96-well plates (AMOS, automated multiplex oligo- 
nucleotide synthesizer). The forward and reverse primers are synthe- 
sized in the same location on separate plates to facilitate the down- 
stream handling of primers. The amplicons are generated by PCR in 
96-well plates as well. 

analysis. On the other hand, direct use of the amplicons is 
much less labor intensive and greatly decreases the occurrence 
of mistakes in clone identification, a ubiquitous problem 
associated with large clone set archiving and retrieving. 

Any large-scale effort to capture each ORF within a genome 
must rely on automation if cost is to be minimized while 
efficiency is maximized. Toward that end, primers targeting 
ORFs were designed automatically using simple new scripts 
and existing primer selection software. These script-selected 
primer sequences were directly read by the high-throughput 
synthesizer and the forward and reverse primers were synthe- 
sized in separate plates in corresponding wells to facilitate 
automated pipetting and PCR amplifications. Each of the 
resulting PCR products, generated with minimum labor, con- 
tains a known, unique ORF. 

Large-scale genome analysis projects are dependent on 
newly emerging technologies to make the studies practical and 
economically feasible. For example, the cost of the primers, a 
significant issue in the past, has been reduced dramatically to 
make feasible this and other projects that require tens of 
thousands of oligonucleotides. Other methods of high- 
throughput analysis are also vital to the success of functional 
analysis projects, such as microarraying and oligonucleotide 
chip methods (10-14). 

Changes in attitude are also required. One of the major costs 
of commercial oligonucleotides is extensive quality control 
such that virtually 100% of the supplied oligonucleotides are 
successfully synthesized and work for their intended purpose. 
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Fig. 2. Overall approach for using database of a genome to direct 
biological analysis. The synthesis of the 6,000 ORFs (orfs) for each 
gene of 5. cerevisiae can be used in many applications utilizing both 
cloning and microarraying technology. 

Considerable cost reduction can be obtained by simply de- 
creasing the expected successful synthesis rate to 95-97%. One 
can then achieve faster and cheaper whole genome coverage by 
simply adding a single quality control at the end of the 
experiment and batching the failures for resynthesis. 

The directed nature of the amplicon approach is of clear 
advantage. The sequence of each ORF is analyzed automati- 
cally, and unique specific primers are made to target each 
ORF. Thus, there is relatively little time or labor involved — for 
example, no random cloning and subsequent screening is 
required because each product is known. In the test system, 
primers for 240 ORFs from chromosome V were systematically 
synthesized, beginning from the left arm and continuing 
through to the right arm. At no point was there any manual 
analysis of sequence information to generate the collection. In 
many ways, now that the sequence is known, there is no need 
for the researcher to examine it. 

These amplicons can be arrayed and expression analysis can 
be done on all arrayed ORFs with a single hybridization (10). 
Those ORFs that display significant differential expression 
patterns under a given selection are easily identified without 
the laborious task of searching for and then sequencing a clone. 
Once scaled up, the procedure provides even greater returns 
on effort, because a single hybridization will ultimately provide 
a "snapshot" of the expression of all genes in the yeast genome. 
Thus, the limiting factor in whole genome analysis will not be 
the analysis process itself, but will instead be the ability of 
researchers to design and carry out experimental selections. 

Current expression and genetic analysis technologies are 
geared toward the analysis of single genes and are ill suited to 
analyze numerous genes under many conditions. Additional 
difficulties with current technologies include: the effort and 
expense required to analyze expression and make mutants, the 
potential duplication of effort if done by different laboratories, 
and the possibility of conflicting results obtained from differ- 
ent laboratories. In contrast, whole genome analysis not only 
is more efficient, it also provides data of much higher quality; 
all genes are assayed and compared in parallel under exactly 
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Fig. 3. Gel image of amplifications. Using the method described in Fig. 1, amplicons were generated for ORFs of S. cerevisiae chromosome 
V. One plate of 96 amplification reactions is shown. 



the same conditions. In addition, amplicons have many appli- 
cations beyond gene expression. For example, one recent 
approach is to incorporate a unique DNA sequence tag, 
synthesized as part of each gene specific primer, during 
amplification. The tags or molecular bar codes, when reintro- 
duced into the organism as a gene deletion or as a gene clone, 
can be used much more efficiently than individual mutations 
or clones because pools of tagged mutants or transformants 
can be analyzed in parallel. This parallel analysis is possible 
because the tags are readily and quantitatively amplified even 
in complex mixtures of tags (13). 

These ORF genome arrays and oligonucleotide tagged 
libraries can be used for many applications. Any conventional 
selection applied to a library that gives discrete or multiple 
products can use these technologies for a simple direct read- 
out. These include screens and selections for mutant comple- 
mentation, overexpression suppression (15, 16), second-site 
suppressors, synthetic lethality, drug target overexpression 
(17), two-hybrid screens (18), genome mismatch scanning (19), 
or recombination mapping. 

The genome projects have provided researchers with a vast 
amount of information. These data must be used efficiently 
and systematically to gain a truly comprehensive understand- 
ing of gene function and, more broadly, of the entire genome 
which can then be applied to other organisms. Such global 
approaches are essential if we are to gain an understanding of 
the living cell. This understanding should come from the 
viewpoint of the integration of complex regulatory networks, 
the individual roles and interactions of thousands of functional 
gene products, and the effect of environmental changes on 
both gene regulatory networks and the roles of all gene 
products. The time has come to switch from the analysis of a 
single gene to the analysis of the whole genome. 

Support was provided by National Institutes of Health Grants 
R37H60198 and P01H600205. 
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INTRODUCTION 

Technological advancements combined with in- 
tensive DNA sequencing efforts have generated an 
enormous database of sequence information over the 
past decade. To date, more than 3 million sequences, 
totaling over 2.2 billion bases [1], are contained 
within the GenBank database, which includes the 
complete sequences of 19 different organisms [2]. The 
first complete sequence of a free-living organism, 
Haemophilus influenzae, was reported in 1995 [3] and 
was followed shortly thereafter by the first complete 
sequence of a eukaryote, Saccharomyces cervisiae [4]. 
The development of dramatically improved sequenc- 
ing methodologies promises that complete elucida- 
tion of the Homo sapiens DNA sequence is not far 
behind [5], 

To exploit more fully the wealth of new sequence 
information, it was necessary to develop novel meth- 
ods for the high-throughput or parallel monitoring 
of gene expression. Established methods such as 
northern blotting, RNAse protection assays, SI nu- 
clease analysis, plaque hybridization, and slot blots 
do not provide sufficient throughput to effectively 
utilize the new genomics resources. Newer methods 
such as differential display [6], high-density filter 
hybridization [7,8], serial analysis of gene expression 
[9], and cDNA- and oligonucleotide-based microarray 
"chip" hybridization [10-12] are possible solutions 
to this bottleneck. It is our belief that the microarray 
approach, which allows the monitoring of expres- 
sion levels of thousands of genes simultaneously, is 
a tool of unprecedented power for use in toxicology 
studies. 



Almost without exception, gene expression is al- 
tered during toxicity, as either a direct or indirect 
result of toxicant exposure. The challenge facing 
toxicologists is to define, under a given set of ex- 
perimental conditions, the characteristic and spe- 
cific pattern of gene expression elicited by a given 
toxicant. Microarray technology offers an ideal plat- 
form for this type of analysis and could be the foun- 
dation for a fundamentally new approach to 
toxicology testing. 

MICROARRAY DEVELOPMENT AND APPLICATIONS 

cDNA Microarrays 

In the past several years, numerous systems were 
developed for the construction of large-scale DNA 
arrays. All of these platforms are based on cDNAs 
or oligonucleotides immobilized to a solid sup- 
port. In the cDNA approach, cDNA (or genomic) 
clones of interest are arrayed in a multi-well for- 
mat and amplified by polymerase chain reaction. 
The products of this amplification, which are usu- 
ally 500- to 2000-bp clones from the 3' regions of 
the genes of interest, are then spotted onto solid 
support by using high-speed robotics. By using 
this method, microarrays of up to 10 000 clones 
can be generated by spotting onto a glass substrate 
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[13,14]. Sample detection for microarrays on glass 
involves the use of probes labeled with fluores- 
cent or radioactive nucleotides. 

Fluorescent cDNA probes are generated from con- 
trol and test RNA samples in single-round reverse-tran- 
scription reactions in the presence of fluorescently 
tagged dUTP (e.g., Cy3-dUTP and Cy5-dUTP), which 
produces control and test products labeled with dif- 
ferent fluors. The cDNAs generated from these two 
populations, collectively termed the "probe," are then 
mixed and hybridized to the array under a glass cov- 
erslip [10,11,15]. The fluorescent signal is detected 
by using a custom-designed scanning confocal mi- 
croscope equipped with a motorized stage and lasers 
for fluor excitation [10,11,15]. The data are analyzed 
with custom digital image analysis software that de- 
termines for each DNA feature the ratio of fluor 1 to 
fluor 2, corrected for local background [16,17]. The 
strength of this approach lies in the ability to label 
RNAs from control and treated samples with differ- 
ent fluorescent nucleotides, allowing for the simul- 
taneous hybridization and detection of both 
populations on one microarray. This method elimi- 
nates the need to control for hybridization between 
arrays. The research groups of Drs. Patrick Brown and 
Ron Davis at Stanford University spearheaded the 
effort to develop this approach, which has been suc- 
cessfully applied to studies of Arabidopsis thaliana 
RNA [10], yeast genomic DNA [15], tumorigenic ver- 
sus non-tumorigenic human tumor cell lines [11], 
human T-cells [18], yeast RNA [19], and human in- 
flammatory disease-related genes [20]. The most dra- 
matic result of this effort was the first published 
account of gene expression of an entire genome, that 
of the yeast Saccharomyces cervisiae [21]. 

In an alternative approach, large numbers of cDNA 
clones can be spotted onto a membrane support, al- 
beit at a lower density [7,22]. This method is useful 
for expression profiling and large-scale screening and 
mapping of genomic or cDNA clones [7,22-24]. In 
expression profiling on filter membranes, two dif- 
ferent membranes are used simultaneously for con- 
trol and test RNA hybridizations, or a single 
membrane is stripped and reprobed. The signal is 
detected by using radioactive nucleotides and visu- 
alized by phosphorimager analysis or autoradiogra- 
phy. Numerous companies now sell such cDNA 
membranes and software to analyze the image data 
[25-27]. 

Oligonucleotide Microarrays 

Oligonucleotide microarrays are constructed either 
by spotting prefabricated oligos on a glass support 
[13] or by the more elegant method of direct in situ 
oligo synthesis on the glass surface by photolithog- 
raphy [28-30]. The strength of this approach lies in 
its ability to discriminate DNA molecules based on 
single base-pair difference. This allows the applica- 
tion of this method to the fields of medical diagnos- 



tics, pharmacogenetics, and sequencing by hybrid- 
ization as well as gene-expression analysis. 

Fabrication of oligonucleotide chips by photoli- 
thography is theoretically simple but technically 
complex [29,30]. The light from a high-intensity 
mercury lamp is directed through a photolitho- 
graphic mask onto the silica surface, resulting in 
deprotection of the terminal nucleotides in the illu- 
minated regions. The entire chip is then reacted with 
the desired free nucleotide, resulting in selected chain 
elongation. This process requires only 4n cycles 
(where n = oligonucleotide length in bases) to syn- 
thesize a vast number of unique oligos, the total num- 
ber of which is limited only by the complexity of the 
photolithographic mask and the chip size [29,31,32]. 

Sample preparation involves the generation of 
double-stranded cDNA from cellular poly(A)+ RNA 
followed by anti sense RNA synthesis in an in vitro 
transcription reaction with biotinylated or fluor- 
tagged nucleotides. The RNA probe is then frag- 
mented to facilitate hybridization. If the indirect 
visualization method is used, the chips are incubated 
with fluor-linked streptavidin (e.g., phycoerythrin) 
after hybridization [12,33]. The signal is detected with 
a custom confocal scanner [34]. This method has 
been applied successfully to the mapping of genomic 
library clones [35], to de novo sequencing by hybrid- 
ization [28,36], and to evolutionary sequence com- 
parison of the BRCA1 gene [37]. In addition, 
mutations in the cystic fibrosis [38] and BRCA1 [39] 
gene products and polymorphisms in the human im- 
munodeficiency virus- 1 clade B protease gene [40] 
have been detected by this method. Oligonucleotide 
chips are also useful for expression monitoring [33] 
as has been demonstrated by the simultaneous evalu- 
ation of gene-expression patterns in nearly all open 
reading frames of the yeast strain S. cerevisiae [12]. 
More recently, oligonucleotide chips have been used 
to help identify single nucleotide polymorphisms in 
the human [41] and yeast [42] genomes. 

THE USE OF MICROARRAYS IN TOXICOLOGY 

Screening for Mechanism of Action 

The field of toxicology uses numerous in vivo 
model systems, including the rat, mouse, and rab- 
bit, to assess potential toxicity and these bioassays 
are the mainstay of toxicology testing. However, in 
the past several decades, a plethora of in vitro tech- 
niques have been developed to measure toxicity, 
many of which measure toxicant-induced DNA dam- 
age. Examples of these assays include the Ames test, 
the Syrian hamster embryo cell transformation as- 
say, micronucleus assays, measurements of sister 
chromatid exchange and unscheduled DNA synthe- 
sis, and many others. Fundamental to all of these 
methods is the fact that toxicity is often preceded 
by, and results in, alterations in gene expression. In 
many cases, these changes in gene expression are a 
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far more sensitive, characteristic, and measurable 
endpoint than the toxicity itself. We therefore pro- 
pose that a method based on measurements of the 
genome-wide gene expression pattern of an organ- 
ism after toxicant exposure is fundamentally infor- 
mative and complements the established methods 
described above. 

We are developing a method by which toxicants 
can be identified and their putative mechanisms of 
action determined by using toxicant-induced gene ex- 
pression profiles. In this method, in one or more de- 
fined model systems, dose and time-course parameters 
are established for a series of toxicants within a given 
prototypic class (e.g., polycyclic aromatic hydrocar- 
bons (PAHs)). Cells are then treated with these agents 
at a fixed toxicity level (as measured by cell survival), 
RNA is harvested, and toxicant-induced gene expres- 
sion changes are assessed by hybridization to a cDNA 
microarray chip (Figure 1). We have developed a cus- 
tom DNA chip, called ToxChip vl.O, specifically for 
this purpose and will discuss it in more detail below. 
The changes in gene expression induced by the test 
agents in the model systems are analyzed, and the 
common set of changes unique to that class of toxi- 
cants, termed a toxicant signature, is determined. 

This signature is derived by ranking across all ex- 
periments the gene-expression data based on rela- 

Control 
Population 



tive fold induction or suppression of genes in treated 
samples versus untreated controls and selecting the 
most consistently different signals across the sample 
set. A different signature may be established for each 
prototypic toxicant class. Once the signatures are de- 
termined, gene-expression profiles induced by un- 
known agents in these same model systems can then 
be compared with the established signatures. A match 
assigns a putative mechanism of action to the test 
compound. Figure 2 illustrates this signature method 
for different types of oxidant stressors, PAHs, and 
peroxisome proliferators. In this example, the un- 
known compound in question had a gene-expres- 
sion profile similar to that of the oxidant stressors in 
the database. We anticipate that this general method 
will also reveal cross talk between different pathways 
induced by a single agent (e.g., reveal that a com- 
pound has both PAH-like and oxidant-like proper- 
ties). In the future, it may be necessary to distinguish 
very subtle differences between compounds within 
a very large sample set (e.g., thousands of highly simi- 
lar structural isomers in a combinatorial chemistry 
library or peptide library). To generate these highly 
refined signatures, standard statistical clustering tech- 
niques or principal-component analysis can be used. 

For the studies outlined in Figure 2, we developed 
the custom cDNA microarray chip ToxChip vl.O. 
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Figure 1. Simplified overview of the method for sample trative purposes, samples derived from cell culture are depicted, 
preparation and hybridization to cDNA microarrays. For illus- although other sample types are amenable to this analysis. 
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Figure 2. Schematic representation of the method for iden- 
tification of a toxicant's mechanism of action. In this method, 
gene-expression data derived from exposure of model sys- 
tems to known toxicants are analyzed, and a set of changes 
characteristic to that type of toxicant (termed the toxicant 
signature) is identified. As depicted, oxidant stressors produce 



consistent changes in group A genes (indicated by red and 
green circles), but not group B or C genes (indicated by gray 
circles). The set of gene-expression changes elicited by the 
suspected toxicant is then compared with these characteristic 
patterns, and a putative mechanism of action is assigned to 
the unknown agent. 



The 2090 human genes that comprise this subarray 
were selected for their well-documented involve- 
ment in basic cellular processes as well as their re- 
sponses to different types of toxic insult. Included 
on this list are DNA replication and repair genes, 
apoptosis genes, and genes responsive to PAHs and 
dioxin-like compounds, peroxisome proliferators, 
estrogenic compounds, and oxidant stress. Some of 
the other categories of genes include transcription 
factors, oncogenes, tumor suppressor genes, cyclins, 
kinases, phosphatases, cell adhesion and motility 
genes, and homeobox genes. Also included in this 
group are 84 housekeeping genes, whose hybridiza- 
tion intensity is averaged and used for signal nor- 
malization of the other genes on the chip. To date, 
very few toxicants have been shown to have appre- 
ciable effects on the expression of these housekeep- 
ing genes. However, this housekeeping list will be 
revised if new data warrant the addition or deletion 
of a particular gene. Table 1 contains a general de- 
scription of some of the different classes of genes 
that comprise ToxChip vl.O. 

When a toxicant signature is determined, the 
genes within this signature are flagged within the 
database. When uncharacterized toxicants are then 
screened, the data can be quickly reformatted so that 
blocks of genes representing the different signatures 



are displayed [11]. This facilitates rapid, visual in- 
terpretation of data. We are also developing Tox- 
Chip v2.0 and chips for other model systems, 
including rat, mouse, Xenopus, and yeast, for use in 
toxicology studies. 

Animal Models in Toxicology Testing 

The toxicology community relies heavily on the 
use of animals as model systems for toxicology test- 
ing. Unfortunately, these assays are inherently ex- 
pensive, require large numbers of animals and take a 
long time to complete and analyze. Therefore, the 
National Institute of Environmental Health Sciences 
(NIEHS), the National Toxicology Program, and the 
toxicology community at large are committed to re- 
ducing the number of animals used, by developing 
more efficient and alternative testing methodologies. 
Although substantial progress has been made in the 
development of alternative methods, bioassays are 
still used for testing endpoints such as neurotoxic- 
ity, immunotoxicity, reproductive and developmen- 
tal toxicology, and genetic toxicology. The rodent 
cancer bioassay is a particularly expensive and time- 
consuming assay, as it requires almost 4 yr, 1200 
animals, and millions of dollars to execute and ana- 
lyze [43]. In vitro experiments of the type outlined 
in Figure 2 might provide evidence that an unknown 
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Table 1. ToxChip v1.0: A Human cDNA Microarray 









on rhin 


Apoptosis 


72 


DNA replication and repair 


99 


Oxidative stress/redox homeostasis 


90 


Peroxisome proliferator responsive 


22 


Dioxin/PAH responsive 


12 


Estrogen responsive 


63 


Housekeeping 


84 


Oncogenes and tumor suppressor genes 


76 


Cell-cycle control 


51 


Transcription factors 


131 


Kinases 


276 


Phosphatases 


88 


Heat-shock proteins 


23 


Receptors 


349 


Cytochrome P450s 


30 



*This list is intended as a general guide. The gene categories are not 
unique, and some genes are listed in multiple categories. 



agent is (or is not) responsible for eliciting a given 
biological response. This information would help to 
select a bioassay more specifically suited to the agent 
in question or perhaps suggest that a bioassay is not 
necessary, which would dramatically reduce cost, 
animal use, and time. 

The addition of microarray techniques to stan- 
dard bioassays may dramatically enhance the sen- 
sitivity and interpretability of the bioassay and 
possibly reduce its cost. Gene-expression signatures 
could be determined for various types of tissue-spe- 
cific toxicants, and new compounds could be 
screened for these characteristic signatures, provid- 
ing a rapid and sensitive in vivo test. Also, because 
gene expression is often exquisitely sensitive to low 
doses of a toxicant, the combination of gene-expres- 
sion screening and the bioassay might allow the use 
of lower toxicant doses, which are more relevant to 
human exposure levels, and the use of fewer ani- 
mals. In addition, gene-expression changes are nor- 
mally measured in hours or days, not in the months 
to years required for tumor development. Further- 
more, microarrays might be particularly useful for 
investigating the relationship between acute and 
chronic toxicity and identifying secondary effects 
of a given toxicant by studying the relationship 
between the duration of exposure to a toxicant and 
the gene-expression profile produced. Thus, a bio- 
assay that incorporates gene-expression signatures 
with traditional endpoints might be substantially 
shorter, use more realistic dose regimens, and cost 
substantially less than the current assays do. 

These considerations are also relevant for branches 
of toxicology not related to human health and not 
using rodents as model systems, such as aquatic toxi- 
cology and plant pathology. Bioassays based on the 
flathead minnow, Daphnia, and Arabadopsis could 



also be improved by the addition of microarray analy- 
sis. The combination of microarrays with traditional 
bioassays might also be useful for investigating some 
of the more intractable problems in toxicology re- 
search, such as the effects of complex mixtures and 
the difficulties in cross-species extrapolation. 

Exposure Assessment, Environmental Monitoring, 
and Drug Safety 

The currently used methods for assessment of ex- 
posure to chemical toxicants are based on measure- 
ment of tissue toxin levels or on surrogate markers 
of toxicity, termed biomarkers (e.g., peripheral blood 
levels of hepatic enzymes or DNA adducts). Because 
gene expression is a sensitive endpoint, gene expres- 
sion as measured with microarray technology may 
be useful as a new biomarker to more precisely iden- 
tify hazards and to assess exposure. Similarly, 
microarrays could be used in an environmental- 
monitoring capacity to measure the effect of poten- 
tial contaminants on the gene-expression profiles 
of resident organisms. In an analogous fashion, 
microarrays could be used to measure gene-expres- 
sion endpoints in subjects in clinical trials. The com- 
bination of these gene-expression data and more 
established toxic endpoints in these trials could be 
used to define highly precise surrogates of safety. 

Gene-expression profiles in samples from exposed 
individuals could be compared to the profiles of the 
same individuals before exposure. From this infor- 
mation, the nature of the toxic exposure can be de- 
termined or a relative clinical safety factor estimated. 
In the future it may also be possible to estimate not 
only the nature but the dose of the toxicant for a 
given exposure, based on relative gene-expression 
levels. This general approach may be particularly 
appropriate for occupational-health applications, in 
which unexposed and exposed samples from the 
same individuals may be obtainable. For example, 
a pilot study of gene expression in peripheral-blood 
lymphocytes of Polish coke-oven workers exposed 
to PAHs (and many other compounds) is under con- 
sideration at the NIEHS. An important consideration 
for these types of studies is that gene expression can 
be affected by numerous factors, including diet, 
health, and personal habits. To reduce the effects 
of these confounding factors, it may be necessary 
to compare pools of control samples with pools of 
treated samples. In the future it may be possible to 
compare exposed sample sets to a national database 
of human-expression data, thus eliminating the 
need to provide an unexposed sample from the same 
individual. Efforts to develop such a national gene- 
expression database are currently under way [44,45]. 
However, this national database approach will re- 
quire a better understanding of genome-wide gene 
expression across the highly diverse human popu- 
lation and of the effects of environmental factors 
on this expression. 
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Alleles, Oligo Arrays, and Toxicogenetics 

Gene sequences vary between individuals, and 
this variability can be a causative factor in human 
diseases of environmental origin [46,47]. A new area 
of toxicology, termed toxicogenetics, was recently 
developed to study the relationship between genetic 
variability and toxicant susceptibility. This field is 
not the subject of this discussion, but it is worth- 
while to note that the ability of oligonucleotide ar- 
rays to discriminate DNA molecules based on single 
base-pair differences makes these arrays uniquely 
useful for this type of analysis. Recent reports dem- 
onstrated the feasibility of this approach [41,42]. 
The NIEHS has initiated the Environmental Genome 
Project to identify common sequence polymor- 
phisms in 200 genes thought to be involved in en- 
vironmental diseases [48]. In a pilot study on the 
feasibility of this application to the Environmental 
Genome Project, oligonucleotide arrays will be used 
to resequence 20 candidate genes. This toxicogenetic 
approach promises to dramatically improve our un- 
derstanding of interindividual variability in disease 
susceptibility. 

FUTURE PRIORITIES 

There are many issues that must be addressed be- 
fore the full potential of microarrays in toxicology 
research can be realized. Among these are model sys- 
tem selection, dose selection, and the temporal na- 
ture of gene expression. In other words, in which 
species, at what dose, and at what time do we look 
for toxicant-induced gene expression? If human 
samples are analyzed, how variable is global gene 
expression between individuals, before and after toxi- 
cant exposure? What are the effects of age, diet, and 
other factors on this expression? Experience, in the 
form of large data sets of toxicant exposures, will 
answer these questions. 

One of the most pressing issues for array scientists 
is the construction of a national public database 
(linked to the existing public databases) to serve as a 
repository for gene-expression data. This relational 
database must be made available for public use, and 
researchers must be encouraged to submit their ex- 
pression data so that others may view and query the 
information. Researchers at the National Institutes 
of Health have made laudable progress in develop- 
ing the first generation of such a database [44,45]. In 
addition, improved statistical methods for gene clus- 
tering and pattern recognition are needed to ana- 
lyze the data in such a public database. 

The proliferation of different platforms and meth- 
ods for microarray hybridizations will improve 
sample handling and data collection and analysis and 
reduce costs. However, the variety of microarray 
methods available will create problems of data com- 
patibility between platforms. In addition, the near- 
infinite variety of experimental conditions under 



which data will be collected by different laborato- 
ries will make large-scale data analysis extremely dif- 
ficult. To help circumvent these future problems, a 
set of standards to be included on all platforms 
should be established. These standards would facili- 
tate data entry into the national database and serve 
as reference points for cross-platform and inter-labo- 
ratory data analysis. 

Many issues remain to be resolved, but it is clear 
that new molecular techniques such as microarray 
hybridization will have a dramatic impact on toxicol- 
ogy research. In the future, the information gathered 
from microarray-based hybridization experiments will 
form the basis for an improved method to assess the 
impact of chemicals on human and environmental 
health. 
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Abstract 

Recent progress in genomics and proteomics technologies has created a unique opportunity to significantly impact 
the pharmaceutical drug development processes. The perception that cells and whole organisms express specific 
inducible responses to stimuli such as drug treatment implies that unique expression patterns, molecular fingerprints, 
indicative of a drug's efficacy and potential toxicity are accessible. The integration into state-of-the-art toxicology of 
assays allowing one to profile treatment-related changes in gene expression patterns promises new insights into 
mechanisms of drug action and toxicity. The benefits will be improved lead selection, and optimized monitoring of 
drug efficacy and safety in pre-clinical and clinical studies based on biologically relevant tissue and surrogate markers. 
© 2000 Elsevier Science Ireland Ltd. All rights reserved. 
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1. Introduction 

The majority of drugs act by binding to protein 
targets, most to known proteins representing en- 
zymes, receptors and channels, resulting in effects 
such as enzyme inhibition and impairment of 
signal transduction. The treatment-induced per- 
turbations provoke feedback reactions aiming to 
compensate for the stimulus, which almost always 
are associated with signals to the nucleus, result- 
ing in altered gene expression. Such gene expres- 
sion regulations account for both the 



* Corresponding author. Tel: + 1-301-4245989; fax: + 1- 
301-7624892. 
E-mail address: steiner@lsbc.com (S. Steiner) 



pharmacological action and the toxicity of a drug 
and can be visualized by either global mRNA or 
global protein expression profiling. Hence, for 
each individual drug, a characteristic gene regula- 
tion pattern, its molecular fingerprint, exists 
which bears valuable information on its mode of 
action and its mechanism of toxicity. 

Gene expression is a multistep process that 
results in an active protein (Fig. 1). There exist 
numerous regulation systems that exert control at 
and after the transcription and the translation 
step. Genomics, by definition, encompasses the 
quantitative analysis of transcripts at the mRNA 
level, while the aim of proteomics is to quantify 
gene expression further down-stream, creating a 
snapshot of gene regulation closer to ultimate cell 
function control. 
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2. Global mRNA profiling 

Expression data at the mRNA level can be 
produced using a set of different technologies 
such as DNA microarrays, reverse transcript 
imaging, amplified fragment length polymorphism 
(AFLP), serial analysis of gene expression 
(SAGE) and others. Currently, DNA microarrays 
are very popular and promise a great potential. 
On a typical array, each gene of interest is repre- 
sented either by a long DNA fragment (200-2400 
bp) typically generated by polymerase chain reac- 
tion (PCR) and spotted on a suitable substrate 
using robotics (Schena et al., 1995; Shalon et al., 
1996) or by several short oligonucleotides (20-30 
bp) synthesized directly onto a solid support using 
photolabile nucleotide chemistry (Fodor et al., 
1991; Chee et al., 1996). From control and treated 
tissues, total RNA or mRNA is isolated and 
reverse transcribed in the presence of radioactive 
or fluorescent labeled nucleotides, and the labeled 
probes are then hybridized to the arrays. The 
intensity of the array signal is measured for each 
gene transcript by either autoradiography or laser 
scanning confocal microscopy. The ratio between 
the signals of control and treated samples reflect 
the relative drug-induced change in transcript 
abundance. 



3. Global protein profiling 

Global quantitative expression analysis at the 
protein level is currently restricted to the use of 
two-dimensional gel electrophoresis. This tech- 
nique combines separation of tissue proteins by 
isoelectric focusing in the first dimension and by 
sodium dodecyl sulfate slab gel electrophoresis- 
based molecular weight separation on the second, 
orthogonal dimension (Anderson et al., 1991). 
The product is a rectangular pattern of protein 
spots that are typically revealed by Coomassie 
Blue, silver or fluorescent staining (Fig. 2). 
Protein spots are identified by mass spectrometry 
following generation of peptide mass fingerprints 
(Mann et al., 1993) and sequence tags (Wilkins et 
al., 1996). Similar to the mRNA approach, the 
ratio between the optical density of spots from 
control and treated samples are compared to 
search for treatment-related changes. 

4. Expression data analysis 

Bioinformatics forms a key element required to 
organize, analyze and store expression data from 
either source, the mRNA or the protein level. The 
overall objective, once a mass of high-quality 
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Fig. 1. Production of an active protein is a multistep process in which numerous regulation systems exert control at various stages 
of expression. Molecular fingerprints of drugs can be visualized through expression profiling at the mRNA level (genomics) using 
a variety of technologies and at the protein level (proteomics) using two-dimensional gel electrophoresis. 
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Fig. 2. Computerized representation of a Coomassie Blue stained two-dimensional gel electrophoresis pattern of Fischer F344 rat 
liver homogenate. 



quantitative expression data has been collected, is 
to visualize complex patterns of gene expression 
changes, to detect pathways and sets of genes 
tightly correlated with treatment efficacy and toxi- 
city, and to compare the effects of different sets of 
treatment (Anderson et al., 1996). As the drug 
effect database is growing, one may detect similar- 
ities and differences between the molecular finger- 
prints produced by various drugs, information 
that may be crucial to make a decision whether to 
refocus or extend the therapeutic spectrum of a 
drug candidate. 



5. Comparison of global mRNA and protein 
expression profiling 

There are several synergies and overlaps of data 
obtained by mRNA and protein expression analy- 
sis. Low abundant transcripts may not be easily 
quantified at the protein level using standard two- 
dimensional gel electrophoresis analysis and their 
detection may require prefractionation of sam- 
ples. The expression of such genes may be prefer- 
ably quantified at the mRNA level using 
techniques allowing PCR-mediated target amplifi- 
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cation. Tissue biopsy samples typically yield good 
quality of both mRNA and proteins; however, the 
quality of mRNA isolated from body fluids is 
often poor due to the faster degradation of 
mRNA when compared with proteins. RNA sam- 
ples from body fluids such as serum or urine are 
often not very 'meaningful', and secreted proteins 
are likely more reliable surrogate markers for 
treatment efficacy and safety. Detection of post- 
translational modifications, events often related to 
function or nonfunction of a protein, is restricted 
to protein expression analysis and rarely can be 
predicted by mRNA profiling. Information on 
subcellular localization and translocation of 
proteins has to be acquired at the level of the 
protein in combination with sample prefractiona- 
tion procedures. The growing evidence of a poor 
correlation between mRNA and protein abun- 
dance (Anderson and Seilhamer, 1997) further 
suggests that the two approaches, mRNA and 
protein profiling, are complementary and should 
be applied in parallel. 

6. Expression profiling and drug development 

Understanding the mechanisms of action and 
toxicity, and being able to monitor treatment 
efficacy and safety during trials is crucial for the 
successful development of a drug. Mechanistic 
insights are essential for the interpretation of drug 
effects and enhance the chances of recognizing 
potential species specificities contributing to an 
improved risk profile in humans (Richardson et 
al., 1993; Steiner et al., 1996b; Aicher et al., 1998). 
The value of expression profiling further increases 
when links between treatment-induced expression 
profiles and specific pharmacological and toxic 
endpoints are established (Anderson et al., 1991, 
1995, 1996; Steiner et al. 1996a). Changes in gene 
expression are known to precede the manifesta- 
tion of morphological alterations, giving expres- 
sion profiling a great potential for early 
compound screening, enabling one to select drug 
candidates with wide therapeutic windows 
reflected by molecular fingerprints indicative of 
high pharmacological potency and low toxicity 
(Arce et al., 1998). In later phases of drug devel- 



opment, surrogate markers of treatment efficacy 
and toxicity can be applied to optimize the moni- 
toring of pre-clinical and clinical studies (Doherty 
et al., 1998). 



7. Perspectives 

The basic methodology of safety evaluation has 
changed little during the past decades. Toxicity in 
laboratory animals has been evaluated primarily 
by using hematological, clinical chemistry and 
histological parameters as indicators of organ 
damage. The rapid progress in genomics and pro- 
teomics technologies creates a unique opportunity 
to dramatically improve the predictive power of 
safety assessment and to accelerate the drug devel- 
opment process. Application of gene and protein 
expression profiling promises to improve lead se- 
lection, resulting in the development of drug can- 
didates with higher efficacy and lower toxicity. 
The identification of biologically relevant surro- 
gate markers correlated with treatment efficacy 
and safety bears a great potential to optimize the 
monitoring of pre-clinical and clinical trails. 
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DNA array technology makes it possible to rapidly genotype individuals or quantify the expression 
of thousands of genes on a single filter or glass slide, and holds enormous potential in toxicologic 
applications. This potential led to a U.S. Environmental Protection Agency-sponsored workshop 
tided "Application of Microarrays to Toxicology" on 7-8 January 1999 in Research Triangle Park, 
North Carolina. In addition to providing state-of-the-art information on the application of DNA or 
gene microarrays, the workshop catalyzed the formation of several collaborations, committees, and 
user's groups throughout the Research Triangle Park area and beyond. Potential application of 
microarrays to toxicologic research and risk assessment include genome-wide expression analyses to 
identify gene-expression networks and toxicant-specific signatures that can be used to define mode 
of action, for exposure assessment, and for environmental monitoring. Arrays may also prove useful 
for monitoring genetic variability and its relationship to toxicant susceptibility in human popula- 
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107:681-685 (1999). [Online 6 July 1999] 
http://ehpnetl.niehs.nikgov/docs/1999/107p681-685 



Decoding the genetic blueprint is a dream that 
offers manifold returns in terms of understand- 
ing how organisms develop and function in an 
often hostile environment. With the rapid 
advances in molecular biology over the last 30 
years, the dream has come a step closer to reali- 
ty. Molecular biologists now have the ability to 
elucidate the composition of any genome. 
Indeed, almost 20 genomes have already been 
sequenced and more than 60 are currently 
under way. Foremost among these is the 
Human Genome Mapping Project. However, 
the genomes of a number of commonly used 
laboratory species are also under intensive 
investigation, including yeast, Arabidopsis, 
maize, rice, zebra fish, mouse, rat, and dog. It 
is widely expected that the completion of such 
programs will facilitate the development of 
many powerful new techniques and approach- 
es to diagnosing and treating genetically and 
environmentally induced diseases which afflict 
mankind. However, the vast amount of data 
being generated by genome mapping will 
require new higjh-throughput technologies to 
investigate the function of the millions of new 
genes that are being reported. Among the most 
widely heralded of the new functional 
genomics technologies are DNA arrays, which 
represent perhaps the most anticipated new 
molecular biology technique since polymerase 
chain reaction (PGR). 

Arrays enable the study of literally thou- 
sands of genes in a single experiment. The 
potential importance of arrays is enormous and 
has been highlighted by the recent publication 
of an entire Nature Genetics supplement dedi- 
cated to the technology (i). Despite this huge 
surge of interest, DNA arrays are still little used 
and largely unproven, as demonstrated by the 
high ratio of review and press articles to actual 
data papers. Even so, the. potential they offer 



has driven venture capitalists into a frenzy of 
investment and many new companies are 
springing up to claim a share of this rapidly 
developing market. 

The U.S. Environmental Protection 
Agency (EPA) is interested in applying DNA 
array technology to ongoing toxicologic stud- 
ies. To learn more about the current state of 
the technology, the Reproductive Toxicology 
Division (RTD) of the National Health and 
Environmental Effects Research Laboratory 
(NHEERL; Research Triangle Park, NC) 
hosted a workshop on "Application of 
Microarrays to Toxicology" on 7-8 January 
1999 in Research Triangle Park, North 
Carolina. The workshop was organized by 
David Dix, Robert Kayiock, and John Rockett 
of the RTD/NHEERL. Twenty-two intra- 
mural and extramural scientists from govern- 
ment, academia, and industry shared informa- 
tion, data, and opinions on the current and 
future applications for this exciting new tech- 
nology. The workshop had more than 150 
attendees, including researchers, students, and 
administrators from the EPA, the National 
Institute of Environmental Health Sciences 
(NIEHS), and a number of other establish- 
ments from Research Triangle Park and 
beyond. Presentations ranged from the tech- 
nology behind array production through the 
sharing of actual experimental data and projec- 
tions on the future importance and applica- 
tions of arrays. The information contained in 
the workshop presentations should provide aid 
and insight into arrays in general and their 
application to toxicology in particular. 

Array El ments 

In the context of molecular biology, the word 
"array" is normally used to refer to a scries of 
DNA or protein elements firmly attached in 



a regular pattern to some kind of supportive 
medium. DNA array is often used inter- 
changeably with gene array or microarray. 
Although not formally defined, microarray is 
generally used to describe the higher density 
arrays typically printed on glass chips. The 
DNA elements that make up DNA arrays 
can be oligonucleotides, partial gene 
sequences, or full-length cDNAs. Companies 
offering p re-made arrays that contain less 
than full-length clones normally use regions 
of the genes which are specific to that gene to 
prevent false positives arising through cross- 
hybridization. Sequence verification of 
cDNA clone identity is necessary because of 
errors in identifying specific clones from 
cDNA libraries and databases. P remade 
DNA arrays printed on membranes are cur- 
rendy or imxninendy available for human, 
mouse, and rat. In most cases they contain 
DNA sequences representing; several thou- 
sand different sequence clusters or genes as 
delineated through the National Center for 
Biotechnology Information UniGene Project 
{2). Many of these different UniGene clusters 
(putative genes) are represented only by 
expressed sequence tags (ESTs). 

Array Printing 

Arrays are typically printed on one of two 
types of support matrix. Nylon membranes 
are used by most off-the-shelf array providers 
such as Clontech Laboratories, Inc. 
(Palo Alto, CA), Genome Systems, Inc. (St. 
Louis, MO), and Research Genetics, Inc. 
(Huntsville, AL). Microarrays such as those 
produced by Affymetrix, Inc. (Santa Clara, 
CA), Incyte Pharmaceuticals, Inc. (Palo Alto, 
CA), and many do-it-yourself (DIY) arraying 
groups use glass wafers or slides. Although 
standard microscope slides may be used, they 
must be preprepared to facilitate sticking 
of the DNA to the glass. Several different 
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coating? have been successfully used, includ- 
ing silane and lysine. The coating of slides 
can easily be carried out in the laboratory, 
but many prefer the convenience of precoated 
slides available from suppliers. 

Once the support matrix has been pre- 
pared, the DNA elements can be applied by 
several methods. Affymetrix, Inc., has devel- 
oped a unique photolithographic technology 
for attaching oligonucleotides to glass wafers. 
More commonly, DNA is applied by either 
noncontact or contact printing. Noncontact 
printers can use thermal, solenoid, or piezoelec- 
tric technology to spray aliquots of solution 
onto the support matrix and may be used to 
produce slide or membrane-based arrays. 
Cartesian Technologies, Inc. (Irvine, CA) has 
developed nQUAD technology for use in its 
PixSys printers. The system couples a syringe 
pump with the microsolenoid valve, a combi- 
nation that provides rapid quantitative dispens- 
ing of nanoUter volumes (down to 4.2 nL) over 
a variable volume range. A different approach 
to noncontact printing uses a solid pin and ring 
combination (Genetic MicroSystems, Inc., 
Woburn, MA). This system (Figure 1) allows a 
broader range of sample, including cell suspen- 
sions and particulates, because the printing 
head cannot be blocked up in the same way as 
a spray nozzle. Fluid transfer is controlled in 
this system primarily by the pin dimensions 
and the force of deposition, although the 
nature of the support matrix and the sample 
will also affect transfer to some degree. 

In contact printing, the pin head is dipped 
in the sample and then touched to the support 
matrix to deposit a small aliquot. Split pins 
were one of the first contact-printing devices 
to be reported and are the suggested format 
for DIY arrayers, as described by Brown (3). 
Split pins are small metal pins with a precise 
groove cut vertically in the middle of the pin 
tip. In this system, 1-48 split pins are posi- 
tioned in the pin-head. The split pins work by 
simple capillary action, not unlike a fountain 
pen — when the pin heads arc dipped in the 
sample, liquid is drawn into the pin groove. A 
small (fixed) volume is then deposited each 
time the split pins are gently touched to 
the support matrix. Sample (100-500 pL 
depending on a variety of parameters) can be 
deposited on multiple slides before refilling is 
required, and array densities of > 2,500 
spots/cm 2 may be produced. The deposit vol- 
ume depends on the split size, sample fluidi- 
ty, and the speed of printing. Split pins are 
relatively simple to produce and can be made 
in-house if a suitable machine shop is avail- 
able. Alternatively, they can be obtained 
directly from companies such as TeleChem 
International, Inc. (Sunnyvale, CA). 

Irrespective of their source, printers 
should be run through a preprint sequence 
prior to producing the actual experimental 
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arrays; the first 100 or so spots of a new run 
tend to be somewhat variable. Factors effect- 
ing spot reproducibility include slide treat- 
ment homogeneity, sample differences, and 
instrument errors. Other factors that come 
into play include clean ejection of the drop 
and clogging (nQUAD printing) and 
mechanical variations and long-term alter- 
ation in print-head surface of solid and split 
pins. However, with careful preparation it is 
possible to get a coefficient of variance for 
spot reproducibility below 10%. 

One potential printing problem is sample 
carryover. Repeated washing, blotting, and 
drying (vacuum) of print pins between samples 
is normally effective at reducing sample carry- 
over to negligible amounts. Printing should 
also be carried out in a controlled environ- 
ment. Humidified chambers are available in 
which to place printers. These help prevent 
dust contamination and produce a uniform 
drying rate, which is important in deterxnining 
spot size, quality, and reproducibility. 

In summary, although several printing 
technologies are available, none are par- 
ticularly outstanding and the bottom line 
is that they are still in a relatively early stage 
of evolution. 

Array Hybridization 

The hybridization protocol is, practically 
speaking, relatively straightforward and those 
with previous experience in blotting should 
have little difficulty. Array hybridizations 
are, in essence, reverse Southern/Northern 
blots — instead of applying a labeled probe to 
the target population of DNA/RNA, the 
labeled population is applied to the probe(s). 
With membrane-based arrays,, the control and 
treated mRNA populations are normally con- 
verted to cDNA and labeled with isotope (e.g., 
33 P) in the process. These labeled populations 
are then hybridized independendy to parallel 
or serial arrays and the hybridization signal is 
detected with a phosporimager. A less com- 
monly used alternative to radioactive probes is 
enzymatic detection. The probe may be 
biotinylated, haptenylated, or have alkaline 
phospharase/horseradish peroxidase attached. 
Hybridization is detected by enzymatic reac- 
tion yielding a color reaction {4}. Differences 
in hybridization signals can be detected by eye 
or, more accurately, with the help of digital 
imaging and commercially available software. 
The labeling of the test populations for slide- 
based microarrays uses a slightly different 
approach. Hie probe typically consists of two 
samples of pofyA + RNA (usually from a treated 
and a control population) that are converted to 
cDNA; in the process each is labeled with a 
different fluor. The independently labeled 
probes are then mixed together and hybridized 
to a single microarray slide and the resulting 
combined fluorescent signal is scanned. After 
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Figure 1. Genetic Microsystems (Wobum, MA) pin 
ring system for printing errays. The pin ring com- 
bination consists of a circular open ring oriented 
parallel to the sample solution, with a vertical pin 
centered over the ring. When the ring is dipped 
into a solution and lifted, it withdraws an aliquot 
of sample held by surface tension. To spot the 
sample, the pin is driven down through the ring 
and a portion of the solution is transferred to the 
bottom of the pin. The pin continues to move 
downward until the pendant drop of solution 
makes contact with the underlying surface. The 
pin is then lifted, and gravity and surface tension 
cause deposition of the spot onto the array. 
Figure from Flowers et al. {14), with permission 
from Genetic Microsystems. 

normalization, it is possible to determine the 
ratio of fluorescent signals from a single 
hybridization of a slide-based microarray. 

cDNA derived from control and treated 
populations of RNA is most commonly 
hybridized to arrays, although subtractive 
hybridization or differential display reactions 
may also be used. Fluorophore- or radiola- 
beled nucleotides are directly incorporated 
into the cDNA in the process of converting 
RNA to cDNA. Alternatively, 5' end-labeled 
primers may be used for cDNA synthesis. 
These are labeled with a fluorophore for 
direct visualization of the hybridized array. 
Alternatively, biotin or a hapten may be 
attached to the primer, in which case fluor- 
labeled streptavidin or antibody must be 
applied before a signal can be generated. The 
most commonly used fluorophores at present 
are cyanine (Cy)3 and Cy5 (Amersham 
Pharmacia Biotech AB, Uppsala, Sweden). 
However, the relative expense of these fluo- 
rescent conjugates has driven a search for 
cheaper alternatives. Fluorescein, rhodarnine, 
and Texas red have all been used, and 
companies such as Molecular Probes, Inc. 
(Eugene, OR) are developing a series of 
labeled nucleotides with a wide range of exci- 
tation and emission spectra which may prove 
to function as well as the Cy dyes. 
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Table 1. Advantages and disadvantages of different microarray scanning systems. 



Nonconfocal laser scanner 


Advantages 
Disadvantages 


Few moving parts 

Fast scanning of bright 
samples 

Less appropriate for dim 
samples 

Optical scatter can limit 
performance 


Relatively simple optics 

Low light collection efficiency 
Background artifacts not rejected 
Resolution typically low 


Small depth of focus reduces 
artifacts 

May have high light collection 
efficiency 

Small depth of focus requires 
scanning precision 



V 



Analysis of DNA Microarrays 

Membrane-based arrays are normally analyzed 
on film or with a phosphorimager, whereas 
chip-based arrays require more specialized scan- 
ning devices. These can be divided into three 
main groups: the charge-coupled device camera 
systems, the nonconfocal laser scanners, and the 
confbcal laser scanners. The advantages and dis- 
advantages of each system are listed in Table 1. 

Because a typical spot on a microarray can 
contain > 10 8 molecules, it is clear that a large 
variation in signal strength may occur. 
Current scanners cannot work across this 
many orders of magnitude (4 or 5 is more typ- 
ical). However, the scanning parameters can 
normally be adjusted to collect more or less 
signal, such that two or three scans of the same 
array should permit the detection of rare and 
abundant genes. 

When a microarray is scanned, the fluores- 
cent images are captured by software normally 
included with the scanner. Several commercial 
suppliers provide additional software for quan- 
tifying array images, but the software took are 
constantly evolving to meet the developing 
needs of researchers, and it is prudent to 
define one's own needs and clarify the exact 
capabilities of the software before its purchase. 
Issues that should be considered include the 
following: 

• Can the software locate offset spots? 

• Can it quantitate across irregular hybridiza- 
don signals? 

• Can the arrayed genes be programmed in for 
easy identification and location? 

• Can the software connect via the Internet to 
databases containing further information on 
the genc(s) of interest? 

One of the key issues raised at the work- 
shop was the sensitivity of microarray technol- 
ogy. Experiments by General Scanning, Inc. 
(Watertown, MA), have shown that by using 
the Cy dyes and their scanner, signal can be 
detected down to levels of < 1 fluor molecule 
per square micrometer, which translates to 
detecting a rare message at approximately one 
copy per cell or less. 

Array Applications 

Although arrays are an emerging technology 
certain to undergo improvement and 
alteration,* they have already been applied use- 
fully to a number of model systems. Arrays are 
at their most powerful when they contain the 
entire genome of the species they are being 
used to study. For this reason, they have strong 
support among researchers utilizing yeast and 
Qnmorhabditis eUgans (5). The genomes of 
both of these species have been sequenced and, 
in the case of yeast, deposited onto arrays for 
examination of gene expression (67). With 
both of these species, it is relatively easy to 
perturb individual gene expression. Indeed, C 



CCD, charge-coupled device. 
From Kawasaki ( 73). 

elegans knockouts can be made simply by 
soaking the worms in an antisense solution of 
the gene to be knocked out. 

By a process of systematic gene disrup- 
tion, it is now possible to examine the cause 
and effect relationships between different 
genes in these simple organisms. This kind of 
approach should help elucidate biochemical 
pathways and genetic control processes, 
deconvolute polygenic interactions, and 
define the architecture of the cellular network. 
A simple case study of how this can be 
achieved was presented by Butow [University 
of Texas Southwestern Medical Center, 
Dallas, TX (Figure 2)]. Although it is the 
phenotypic result of a single gene knockout 
that is being examined, the effect of such 
perturbation will almost always be polygenic 
Polygenic interactions will become increasing- 
ly important as researchers begin to move" 
away from single gene systems when examin- 
ing the nature of toxicologic responses to 
external stimuli. This is especially important 
in toxicology because the phenotype pro- 
duced by a given environmental insult is 
never the result of the action of a single gene; 
rather, it is a complex interaction of one or 
multiple cellular pathways. Phenomena such 
as quantitative trait (the continuous variation 
of phenotype), epistasis (the effect of alleles of 
one or more genes on the expression of other 
genes), and penetrance (proportion of indi- 
viduals of a given genotype that display a par- 
ticular phenotype) will become increasingly 
evident and important as toxicologists push 
toward the ultimate goal of matching the 
responses of individuals to different 
environmental stimuli. 

Analysis of the transcriptome (the expres- 
sion level of all the genes in a given cell popula- 
tion) was a use of arrays addressed by several 
speakers. Unfortunately, current gene nomen- 
clature is often confusing in that single genes 
are allocated multiple names (usually as a result 
of independent discovery by different laborato- 
ries), and there was a call for standardization of 
gene nomenclature. Nevertheless, once a tran- 
scriptome has been assembled it can then be 
transferred onto arrays and used to screen any 
chosen system. The EPA MicroArray 
Consortium (EPAMAQ is assembling testes 



transcriptomes for human, rat, and mouse. In a 
slightly different approach, Nuwaysir et al. (S) 
describes how the NIEHS assembled what is 
effectively a "toxicological transcriptome" — a 
library of human and mouse genes that have 
previously been proven or implicated in 
responses to toxicologic insults. Clontech 
Laboratories, Inc. (Palo Alto, CA), has begun a 
similar process by developing stress/toxicology 
filter arrays of rat, mouse, and human genes. 
Thus, rather than being tissue or cell specific, 
these stress/toxicology arrays can be used across 
a variety of model systems to look for alter- 
ations in the expression of toxicologically 
important genes and define the new field of 
toxicogenomics. The potential to identify toxi- 
cant families based on tissue- or cell-specific 
gene expression could revolutionize drug test- 
ing. These molecular signatures or fingerprints 
could not only point to the possible 
toxicity/carcinogenicity of newly discovered 
compounds (Figure 3), but also aid in elucidat- 
ing their mechanism of action through identifi- 
cation of gene expression networks. By exten- 
sion, such signatures could provide easily iden- 
tifiable biomarkers to assess the degree, rime, 
and nature of exposure. 

DNA arrays are primarily a tool for exam- 
ining differential gene expression in a given 
model. In this context they are referred to as 
closed systems because they lack the ability of 
other differential expression technologies, eg., 
differential display and subtractive hybridiza- 
tion, to detect previously unknown genes not 
present on the array. This would appear to 
limit the power of DNA arrays to the imagina- 
tions and preconceptions of the researcher in 
selecting genes previously characterized and 
thought to be involved in the model system. 
However, the various genome sequencing pro- 
jects have created a new category of 
sequence-rthe EST — that has partially molli- 
fied this deficiency. ESTs are cDNAs expressed 
in a given tissue that, although they may share 
some degree of sequence similarity to previous- 
ly characterized genes, have not been assigned 
specific genetic identity. By incorporating EST 
clones into an array, it is possible to monitor 
the expression of these unknown genes. This 
can enable the identification of previously 
uncharactcrized genes that may have biologic 
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significance in the model system. Filter arrays 
from Research Genetics and slide arrays from 
Incyte Pharmaceuticals both incorporate large 
numbers of ESTs from a variety of species. 

A further use of microarrays is die identifi- 
cation of single nucleotide polymorphisms 
(SNPs). These genomic variations are abun- 
dant — they occur approximately every 1 kb or 
so — and are the basis of restriction fragment 
length polymorphism analysis used in forensic 
analysis. Affymetrix, Inc., designed chips that 
contain multiple repeats of the same gene 
sequence. Each position is present with all four 
possible bases. After the hybridization of the 
sample, the degree of hybridization to the dif- 
ferent sequences can be measured and the exact 
sequence of the target gene deduced. SNPs are 
thought to be of vital importance in drug 
metabolism and toxicology. For example, sin- 
gle base differences in the regulatory region or 
active site of some genes can account for huge 
differences in the activity of that gene. Such 
SNPs are thought to explain why some people 
are able to metabolize certain xenobiotics bet- 
ter than others. Thus, arrays provide a further 
tool for the toxicologist investigating the 
nature of susceptible subpopulations and toxi- 
cologic response. 

There are still many wrinkles to be ironed 
out before arrays become a standard tool for 
toxicologists. The main issues raised at the 
workshop by those with hands-on experience 
were the following: 

• Expense: the cost of purchasing/contracting 
this technology is still too great for many 
individual laboratories. 




#-^+ 

Figure 2. Potential effects of gene knockout within 
positively and negatively regulated gene expression 
networks. /, is limiting in wild type for expression of 
i}, \A) A simple, two-component linear regulatory 
network operating on gene t,, where /, is e positive 
effector of ^ and j n is either a positive or negative 
effector of i v This network could be deduced by 
examining the consequence of (0) deleting j n on the 
expression of /, and ^ where the expression of ^ 
would be decreased or increased depending on 
whether j n was a positive or negative regulator. 
These and other connected components of even 
greater complexity could be revealed by genome- 
wide expression analysis. From Butow ( HI. 



> Clones: the logistics of identifying, obtaining, 
and maintaining a set of nonredundant, non- 
contaminated, sequence-verified, species/cell/ 
tissue/field-specific clones. 

1 Use of inbred strains: where whole-organism 
models are being used, the use of inbred 
strains is important to reduce the potentially 
confusing effects of the individual variation 
typically seen in outbred populations. 

> Probe: the need for relatively large amounts 
of RNA, which limits the type of sample 
(eg., biopsy) that can be used. Also, different 
RNA extraction methods can give different 
results. 

t Specificity: the ability to discriminate accu- 
! rately between closely related genes (eg., the 
; cytochrome p450 family) and splice variants, 
t Quantitation: the quantitation of gene 
\ expression using gene arrays is still open to 
debate. One reason for this is the different 
incorporation of the labeling dyes. However, 
the main difficulty lies in knowing what to 
normalize against One option is to include a 
large number of so-called housekeeping genes 
in the array. However, the expression of these 
genes often change depending on the tissue 
and the toxicant, so it is necessary to charac- 
terize the expression of these genes in the 
model system before utilizing them. This is 
clearly not a viable option when screening 
multiple new compounds. A second option 
is to include on the array genes from a nonre- 
lated species (eg., a plant gene on an animal 
array) and to spike the probe with synthetic 
RNA(s) complementary to the gene(s). 
• Reproducibility: this is sometimes question- 
able, and a figure of approximately two or 
three repeats was used as the minimum num- 
ber required to confirm initial findings. 



Again, however, most people advocated the 
use of Northern blots or reverse transcriptase 
PCR to confirm findings. 

• Sensitivity: concerns were voiced about the 
number of target molecules that must be pre- 
sent in a sample for them to be detected on 
the array. 

• Efficiency: reproducible identification of 1.5- 
to 2-fold differences in expression was report- 
ed, although the number of genes that 
undergo this level of change and remain 
undetected is open to debate. It is important 
that this level of detection be ultimately 
achieved because it is commonly perceived 
that some important transcription factors 
and their regulators respond at such low lev- 
els. In most cases, 3- to 5-fold was the mini- 
mum change that most were happy to 
accept. 

• Bioinformarics: perhaps the greatest concern 
was how to accurately interpret the data with 
the greatest accuracy and efficiency. The 
biggest headache is trying to identify net- 
works of gene expression that are common to 
different treatments or doses. The amount of 
data from a single experiment is huge. It may 
be that, in the future, several groups individ- 
ually equipped with specialized software algo- 
rithms for studying their favorite genes or 
gene systems will be able to share die same 
hybridized chips. Thus, arrays could usher in 
a new perspective on collaboration and the 
sharing of data. 

EPAMAC 

Perhaps the main reason most scientists are 
unable to use array technology is the high cost 
involved, whether buying off-the-shelf mem- 
branes, using contract printing services, or 
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figure 3. Gene expression profiles— also called fingerprints or signatures — of known toxicants or toxi- 
cant families may, in the future, be used to identify the potential toxicity of new drugs, etc. In this exam- 
ple, the genetic signature of test compound 1 is identical to that of known peroxisome prolrferators, 
whereas that of test compound 2 does not match any known toxicant family. Based on these results, test 
cpmpound 2 would be retained for further testing and test compound 1 would be eliminated. 
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producing chips in- house. In view of this, 
researchers at the RTD/NHEERL initiated 
the EPAMAC- This consortium brings 
together scientists from the EPA and a num- 
ber of extramural labs with the aim of devel- 
oping microarray capability through the shar- 
ing of resources and data. EPAMAC 
researchers are primarily interested in the 
developmental and toxicologic changes seen 
in testicular and breast tissue, and a portion 
of the workshop was set aside for EPAMAC 
members to share their ideas on how the 
experimental application of microarrays could 
facilitate their research. One of the central 
areas of interest to EPAMAC members is the 
effect of xenobiotics on male fertility and 
reproductive health. Of greatest concern is 
the effect of exposure during critical periods 
of development and germ cell differentiation 
(9) t and how this may compromise sperm, 
counts and quality following sexual matura- 
tion (10). As well as spermatogenic tissue, 
there is also interest in how residual mRNA 
found in mature sperm (11) could be used as 
an indicator of previous xenobiotic effects (it 
is easier to obtain a semen sample than a tes- 
ticular biopsy). Arrays will be used to examine 
and compare the effect of exposure to heat 
and chemicals in testicular and epididymal 
gene expression profiles, with the aim of 
establishing relationships/associations 
between changes in developmental landmarks 
and the effects on sperm count and quality. 
Cluster, pattern, and other analysis of such 
data should help identify hidden relationships 
between genes that may reveal potential 
mechanisms of action and uncover roles for 
genes with unknown functions. 

Summary 

The full impact of DNA arrays may not be 
seen for several years, but the interest shown at 
this regional workshop indicates the high level 
of interest that they foster. Apart from educat- 
ing and advertising the various technologies in 
this field, this workshop brought together a 
number of researchers from the Research 
Triangle Park area who are already using DNA 
arrays. The interest in sharing ideas and experi- 
ences led to the initiation of a Triangle array 
user's group. 
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Array technology is still in its infancy. This 
means that the hardware is still improving and 
there] is no current consensus for standard pro- 
cedures, quantitation, and interpretation. 
Consistency in spotting and scanning arrays is 
not yet optimized, and this is one of the most 
critical requirements of any experiment. In 
addition, one of the dark regions of array tech- 
nology — strife in the courts over who owns 
wharj portions of it — has further muddled the 
future and is a potential barrier toward the 
development of consensus procedures. 

Perhaps the greatest hurdle for the applica- 
tion of arrays is the actual interpretation of 
data. No specialists in bioinfbrrnarics attended 
the workshop, largely because they are rare and 
because as yet no one seems clear on the best 
method of approaching data analysis and inter- 
pretation. Cross-referencing results from mul- 
tiple jexperiments (time, dose, repeats, different 
anirnals, different species) to identify common- 
ly expressed genes is a great challenge. In most 
cases; we are still a long way from understand- 
ing How the "expression of gene X is related to 
the Expression of gene X and ordering gene 
expression to delineate causal relationships. 

To the ordinary scientist in the typical lab- 
oratory, however, the most immediate prob- 
lem is a lack of affordable instrumentation. 
One] can purchase premade membranes at 
relatively affordable prices. Although these 
may I be useful in identifying individual genes 
to pursue in more detail using other methods, 
the rjumbers that would be required for even a 
small routine toxicology experiment prohibit 
this as a truly viable approach. For the toxicol- 
ogistt, there is a need to carry out multiple 
experiments — dose responses, time curves, 
multiple animals, and repeats. Glass-based 
DNA arrays are most attractive in this context 
because they can be prepared in large batches 
from the same DNA source and accommo- 
date control and treated samples on the same 
chip! Another problem with current off-the- 
shelf] arrays is that they often do not contain 
one pr more of the particular genes a group is 
interested in. One alternative is to obtain 
t produce a set of custom clones and 
contract printing of membranes or slides 
out by a company such as Genomic 
Solutions, Inc. (Ann Arbor, MI). This approach 
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is less expensive than laying out capital for 
one's own entire system, although at some 
point it might make economic sense to print 
one s own arrays. 

Finally, DNA arrays are currently a team 
effort. They are a technology that uses a wide 
range of skills including engineering, statistics, 
molecular. biology, chemistry, and bioinfbr- 
rnarics. Because most individuals are skilled in 
only one or perhaps two of these areas, it 
appears that success with arrays may be best 
expected by teams of collaborators consisting 
of individuals having each of these skills. 

Those considering array applications may 
be amused or goaded on by the following 
quote from Fortune magazine (12): 

Microprocessors have reshaped our economy, . 
spawned vast fortunes and changed the way we live. 
Gene chips could be even bigger. 

Although this comment may have been 
designed to excite the imagination rather than 
accurately reflect the truth, it is fair to say that 
the age of functional genomics is upon us. 
DNA arrays look set to be an important tool in 
this new age of biotechnology and will likely 
contribute answers to some of toxicology's 
most fundamental questions. 
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Subject: RE: [F*d: Toxicology Chip] 
Date: Mem. 3 Jul 2000 08:09:45 -0400 
From: "Afshari.Cynthia" <afshari(I'nieh$.nih.gov> 
To: ""Diana Hamlet-Cox*" <dianahcd'mcyiexom> 

Vou car. see the list of clones that we have or. our 12K chip at 
http : mar.uel . r. lehs .nih. scv naps guest clor.esrsh. rfr. 

We selectee a subset of genes (2000K) that we believed critical tc t=»: 
response and basic cellular processes and added a set of clones and ISTs 
this. We have included a set of control genes (80-) that were selected by 
the KHGRI because they did not change across a large set of array 
experiments. However, we have found that some of these genes chance 
signficantly after tox treatments and are in the process cf looktr.c at the 
variation of each of these 80* genes across our experiments. 
Our chips are constantly changing and being updated and we hope that cur 
data will lead us to what the toxchip should really be. 
I hope this answers your question. 
Cindy Afshari 

> 

> From: Diana Hamiez-Cox 

> Sent: Monday, June 26. 2000 8:52 PM 

> To: afshari Qniehs.nih.gov 

> Subjecz: (Fwd: Toxicology Chip] 
> 

> Dear Dr. Afshari, 
> 

> Since I have not yez had a response from Bill Grigg, perhaps he was not 

> the right person to contact. 
> 

> Can you help me in this matter? I. don't need to know the sequences. 

> necessarily, buz I would like very much to know what types of sequences 

> are being used, e.g., GPCRs (more specific?) . ion channels, etc. 
> 

> Diana Hamlet-Cox 
> 

> Original Message 

> Subject; Toxicology Chip 

> Daze: Mon. 19 Jun 2000 18:31:48 -0700 

> From: Diana Hamlet-Cox <dianahc0incyte.com> 

> Organization: Incyte Pharmaceuticals 

> To: grigg6niehs.nih.gov 
> 

> Dear Colleague: 
> 

> I am doing literature research on the use of expressed genes as 

> pharmacozoxicology markers, and found the Press Release dated February 

> 29, 2000 regarding the work of the NIZHS in this area. I would like zo 

> know if there is a resource I can access (or you could provide?) that 

> would give me a list of the 12,000 genes that are on your Human ToxChip 

> Microarray. In particular, I am interested in the criteria used zo 

> select sequences for the ToxChip, including any control sequences 

> included in the microarray. 
> 

> Thank you for your assistance in this request. 
> 

> Diana Hamlet-Cox, Ph.D. 

> Incyte Genomics, Inc. 
> 

> — 
> 
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Proteomics: a major new 
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Proteomics is a new enabling technology that is being 
integrated into the drug discovery process. This will 
facilitate the systematic analysis of proteins across any 
biological system or disease, forwarding new targets 
and information on mode of action, toxicology and sur- 
rogate markers. Proteomics is highly complementary to 
genomic approaches in the drug discovery process and, 
for the first time, offers scientists the ability to integrate 
information from the genome, expressed mRNAs, their 
respective proteins and subcellular localization. It is ex- 
pected that this will lead to important new insights into 
disease mechanisms and improved drug discovery 
strategies to produce novel therapeutics. 

Among the major pharmaceutical and biotechnol- 
ogy companies, it is clearly recognized that the 
business of modern drug discovery is a highly 
competitive process. All of the many steps in- 
volved are inherently complex, and each can involve a 
high risk of attrition. The players in this business strive 
continuously to optimize and streamline the process; each 
seeking to gain an advantage at every step by attempting 
to make informed decisions at the earliest stage possible. 
The desired outcome is to accelerate as many key activities 
in the drug discovery process as possible. This should pro- 



duce a new generation of robust drugs that offer a high 
probability of success and reach the clinic and market 
ahead of the competition. 

There has been noticeable emphasis over recent years 
for companies to aggressively review and refine their 
strategies to discover new drugs. Central to this has been 
the introduction and implementation of cutting-edge 
technologies. Most, if not all, companies have now inte- 
grated key technology platforms that incorporate gen- 
omics, mRNA expression analysis, relational databases, 
high-throughput robotics, combinatorial chemistry and 
powerful bioinformatics. Although it is still early days to 
quantify the real impact of these platforms in clinical and 
commercial terms, expectations are high, and it is widely 
accepted that significant benefits will be forthcoming. This 
is largely based on data obtained during preclinical studies 
where the genomic 1,2 and microarray 3,4 technologies have 
already proved their value. 

However, there are several noteworthy outcomes that re- 
sult from this. Many comments are voiced that scientists 
armed with these technologies are now commonly faced 
with data overload. Thus, in some instances, rather than 
facilitating the decision process, the accumulation of more 
complex data points, many with unknown consequences, 
can seem to hinder the process. Also, most drug compa- 
nies have simultaneously incorporated very similar compo- 
nents of the new technology platforms, the consequence 
being that it is becoming difficult yet again to determine 
where a clear competitive advantage will arise. Finally, in 
recent years, largely as a result of the accessibility of the 
technologies, there has been an overwhelming emphasis 
placed on genomic and mRNA data rather than on protein 



Martin J. Page*, Bob Amess, Christian Rohlff, Colin Stubberfield and Raj Parekh, Oxford GlycoSciences, 10 The Quadrant, 
Abingdon Science Park, Abingdon, Oxfordshire, UK 0X14 3YS. *tel: +44 1235 543277, fax: +44 1235 543283, 
e-mail: martin.page@ogs.co.uk 



DDT Vol. 4, No. 2 February 1999 1 359-6446/99/$ - see front matter © Elsevier Science. All rights reserved. PI I: S1 359-6446(98)01 291 -4 



55 



t 



REVIEWS 



Sample 



CO 

i 

O 



CD 
CO 
CO 
CD 

w 



2D gels and 
imaging 










Curation and 
interrogation 

Composite - normal 




Composite - disease 



Differential analysis 
(Proteograph™) 

MCI 

Fold increase Fold decrease 



Mass spectrometry 
and annotation 



917 



852 



827 

810 807 
738 



698'— 
689' 
812 
611 



1100 

1000 

900 

800 

700 

cu 

600 J 
o 

_ _ _ CO 

500 « 

Co 



I 1 1 1 1 1 1 1 1 - 

100 50 0 

Abundance (%) 



400 
I-300 
200 
100 
0 



Figure 1. Steps involved in analysing a biological sample by proteomics. MCI, molecular cluster index. 



analysis. It is important to remember that proteins dictate 
biological phenotype - whether it is normal or diseased - 
and are the direct targets for most drugs. 

Pr teomics: new technology for 
the analysis of proteins 

It is now timely to recognize that complementary technol- 
ogy in the form of high-throughput analysis of the total 
protein repertoire of chosen biological samples, namely 
proteomics, is poised to add a new and important dimen- 
sion to drug discovery. In a similar fashion to genomics, 
which aims to profile every gene expressed in a cell, pro- 
teomics seeks to profile every protein that is expressed 5 " 7 . 
However, there is added information, since proteomics can 
also be used to identify the post-translational modifications 
of proteins 8 , which can have profound effects on bio- 
logical function, and their cellular localization. Importantly, 
proteomics is a technology that integrates the significant 
advances in two-dimensional (2D) electrophoretic separa- 
tion of proteins, mass spectrometry and bioinformatics. 
With these advances it is now possible to consistently de- 
rive proteomes that are highly reproducible and suitable 
for interrogation using advanced bioinformatic tools. 

There are many variations whereby different laboratories 
operate proteomics. For the purpose of this review, the 



process used at Oxford GlycoSciences (OGS), which uses 
an industrial-scale operation that is integral to its drug dis- 
covery work, will be described. The individual steps of 
this process, where up to 1000 2D gels can be run and 
analysed per week, are summarized in Fig. 1. The incom- 
ing samples are bar coded and all information relevant to 
the sample is logged into a Laboratory Information 
Management System (LIMS) database. There can be a wide 
range in the type of samples processed, as applicable to 
individual steps in the drug discovery pipeline, and these 
will be mentioned later. The samples are separated accord- 
ing to their charge (pi) in the first dimension, using iso- 
electric focusing, followed by size (MW) using SDS-PAGE 
in the second dimension. Many modifications have been 
made to these steps to improve handling, throughput and 
reproducibility. The separated proteins are then stained 
with fluorescent dyes which are significantly more sensi- 
tive in detection than standard silver methods and have a 
broader dynamic range. The image of the displayed pro- 
teins obtained is referred to as the proteome, and is digi- 
tally scanned into databases using proprietary software 
called ROSETTA™. The images are subsequently curated, 
which begins with the removal of any artefacts, cropping 
and the placement of pI/MW landmarks. The images from 
replicate images are then aligned and matched to one 
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another to generate a synthetic composite image. This is 
an important step, as the proteome is a dynamic situation, 
and it captures the biological variation that occurs, such 
that even orphan proteins are still incorporated into the 
analysis. 

By means of illustration, Fig. 1 shows the process 
whereby proteomes are generated from normal and dis- 
ease samples and how differentially expressed proteins are 
identified. The potential of this type of analysis is tremen- 
dous. For example, from a mammalian cell sample, in ex- 
cess of 2000 proteins can typically be resolved within the 
proteome. The quality of this is shown in Fig. 2, which 
shows representative proteomes from three diverse bio- 
logical sources: human serum, the pathogenic fungus 
Candida albicans and the human hepatoma cell line 
Huh7. 

Use of proteomics to identify 
disease specific proteins 

In most cases, the drug discovery process is initiated by 
the identification of a novel candidate target - almost al- 
ways a protein - that is believed to be instrumental in the 
disease process. To date, there is a variety of means 
whereby drug targets have been forthcoming. These in- 
clude molecular, cellular and genomic approaches, mostly 
centred upon DNA and mRNA analysis. The gene in ques- 
tion is isolated, and expression and characterization of its 
coded protein product - i.e. the drug target - is invariably 
a secondary event. 

With the proteomic approach, the starting point is at the 
other end of the 'telescope*. Here there is direct and im- 



mediate comparison of the proteomes from paired normal 
and disease materials. Examples of these pairs are: (1) pu- 
rified epithelial cell populations derived from human 
breast tumours, matched to purified normal populations of 
human breast epithelial cells, and (2) the invading patho- 
genic hyphal form of C. albicans, matched to the non- 
invading yeast form of C. albicans. When the proteome 
images from each pair are aligned, the Proteograph™ soft- 
ware is able to rapidly identify those proteins (each refer- 
enced as having a unique molecular cluster index, or MCI) 
that are either unique, or those that are differentially ex- 
pressed. Thus, the Proteograph output from this analysis is 
both qualitative and quantitative. 

Proteograph analysis for a particular study can also be 
undertaken on any number of samples. For example, one 
might compare anything from a few to several hundred 
preparations or samples, each from a normal and disease 
counterpart, and have these analysed in a single 
Proteograph study. In this way, it is possible to assign 
strong statistical confidence to the data and in some in- 
stances to identify specific subpopulations within, the input 
biological sources. This feature will become increasingly 
significant in the near future, and there is a clear synergy 
here whereby proteomics can work closely with pharma- 
cogenomic approaches to stratify patient populations and 
achieve effective targeted care for the patient. Whatever 
the source of the materials, the net output of Proteograph 
analysis is immediate identification of disease specific pro- 
teins. This is shown in Fig. 3, which shows the results of 
a proteograph obtained by comparing untreated human 
hepatoma cells with cells following exposure to a clinical 
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Figure 2. Representative proteomes obtained from (a) human serum, (b) the pathogenic fungus Candida albicans 
and (c) the human hepatoma cell line Huh 7. 
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Figure 3. Table of differential protein expression 
profiles, referred to as a Rosetta Proteograph ™, 
between Hub 7 cells witb and without the cytotoxic 
agent 5-FU. Bars are quantized and do not represent 
exact fold change values. 



cytotoxic agent. In this instance, only the top 20 differen- 
tially expressed MCIs are shown, but the readout would 
normally extend to a defined cut-off value, typically a two- 
fold or greater difference in expression levels, determined 
by the user. 

In a typical analysis involving disease and normal mam- 
malian material, in which each proteome would have 
-2000 protein features each assigned an MCI, the proteo- 
graph might identify somewhere in the region of 50-300 
MCIs that are unique or differentially expressed. To capi- 
talize rapidly on these data, at OGS a high-throughput 
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mass spectrometry facility coupled to advanced databases 
to annotate these MCIs as individual proteins is applied. As 
these are all disease specific proteins, each could represent 
a novel target and/or a novel disease marker. The process 
becomes even more powerful when a panel of features, 
rather than individual features, are assigned. The relevance 
of this is apparent when one considers that most diseases, 
if not all, are multifactorial in nature and arise from poly- 
genic changes. Rather than analysing events in isolation, 
the ability to examine hundreds or thousands of events 
simultaneously, as shown by proteomics, can offer real 
advantages. 

Identification and assignment of candidate targets 
The rapid identification and assignment of candidate tar- 
gets and markers represents a huge challenge, but this has 
been greatly facilitated by combining the recent advances 
made in proteomics and analytical mass spectrometry 9 . 
Using automated procedures it is now possible to annotate 
proteins present in femtomole quantities, which would de- 
pict the low abundance class of proteins. The process of 
annotation is similarly aided by the quality and richness of 
the sequence specific databases that are currently avail- 
able, both in the public domain and in the private sector 
(e.g. those supplied by Incyte Pharmaceuticals). In this re- 
spect, the advances in proteomics have benefited consider- 
ably from the breakthroughs achieved with genomics. 

From an application perspective, cancer studies provide a 
good opportunity whereby proteomics can be instrumental 
in identifying disease specific proteins, because it is often 
feasible to obtain normal and diseased tissue from the same 
patient. For example, proteomic studies have been re- 
ported on neuroblastomas 10 , human breast proteins from 
normal and tumour sources 11 " 13 , lung tumours 14 , colon tu- 
mours 15 and bladder tumours 16 . There are also proteomic 
studies reported within the cardiovascular therapeutic area, 
in which disease or response proteins are identified 1718 . 

Genomic microarray analysis can similarly identify 
unique species or clusters of mRNAs that are disease spe- 
cific. However, in some instances, there is a clear lack of 
correlation between the levels of a specific mRNA and its 
corresponding protein (Ref. 19, Gypi, S.P, et ai, submit- 
ted). This has now been noted by many investigators and 
reaffirms that post-transcriptional events, including protein 
stability, protein modification (such as phosphorylation, 
glycosylation, acylation and methylation) and cell localiz- 
ation, can constitute major regulatory steps. Proteomic 
analysis captures all of these steps and can therefore pro- 
vide unique and valuable information independent from, 
or complementary to, genomic data. 
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Prote micsf r target validation and signal transduc- 
tion studies 

The identification of disease specific proteins alone is in- 
sufficient to begin a drug screening process. It is critical to 
assign function and validation to these proteins by con- 
firming they are indeed pivotal in the disease process. 
These studies need to encompass both gain- and Ioss-of- 
function analyses. This would determine whether the activity 
of a candidate target (an enzyme, for example), eliminated 
by molecular/cellular techniques, could reverse a disease 
phenotype. If this happened, then the investigator would 
have increased confidence that a small-molecule inhibitor 
against the target would also have a similar effect. The 
proposal of candidate drug targets is often not a difficult 
process, but validating them is another matter. Validation 
represents a major bottleneck where the wrong decision 
can have serious consequences 20 . 

Proteomics can be used to evaluate the role of a chosen 
target protein in signal transduction cascades directly rel- 
evant to the disease. In this manner, valuable information 
is forthcoming on the signalling pathways that are per- 
turbed by a target protein and how they might be cor- 
rected by appropriate therapeutics. Techniques that are 
well established in one-dimensional protein studies to in- 
vestigate signalling pathways, such as western blotting 
and immunoprecipitation, are highly suited to proteomic 
applications. For example, the proteomes obtained can be 
blotted onto membranes and probed with antibodies 
against the target protein or related signalling mol- 
ecules 21-23 . Because proteomics can resolve >2000 pro- 
teins on a single gel, it is possible to derive important 
information on specific isoforms (such as glycosylated or 
phosphorylated variants) of signalling molecules. This will 
result in characterization of how they are altered in the 
disease process. Western immunoblotting techniques 
using high-affinity antibodies will typically identify pro- 
teins present at -10 copies per cell (-1.7 fmol); this is in 
contrast to the best fluorescent dyes currently available 
that are limited to imaging proteins at 1000 or more 
copies per cell. The level of sensitivity derived by these 
applications will greatly facilitate interpretation of com- 
plex signalling pathways and contribute significantly to 
validation of the target under study. 

immunoprecipitation studies 

Similarly, immunoprecipitation studies are another useful 
way to exploit the resolving power of proteomics 24,25 . In 
this instance, very large quantities of protein (e.g. several 
milligrams) can be subjected to incubation with antibodies 
against chosen signalling molecules. This allows high-affin- 



ity capture of these proteins, which can subsequently be 
eluted and electrophoresed on a 2D gel to provide a high- 
resolution proteome of a specific subset of proteins. 
Detection by blot analysis allows the identification of ex- 
tremely small amounts of defined signalling molecules. 
Again, the different isoforms of even very low abundance 
proteins can be seen, and, very importantly, the technique 
allows the investigator to identify multiprotein complexes 
or other proteins that co-precipitate with the target protein. 
These coassociating proteins frequently represent sig- 
nalling partners for the target protein, and their identifi- 
cation by mass spectrometry can lead to invaluable infor- 
mation on the signalling processes involved. 

The depth of signal transduction analysis offered by 
proteomics, and the utility for target validation studies, 
can be extended even further by applying cell fraction- 
ation studies 26-28 . By purifying subcellular fractions, such 
as membrane, nuclear, organelle and cytosolic, it is possi- 
ble to assign a localization to proteins of interest and to 
follow their trafficking in a cell. Enrichment of these frac- 
tions will also allow much higher representation of low 
abundance proteins on the proteome. Their detection by 
fluorescent dyes or immunoblot techniques will lead to 
the identification of proteins in the range of 1-10 copies 
per cell, putting the sensitivity on a par with genomic 
approaches. 

These signal transduction analyses can be of additional 
value in experiments where inhibitors derived from a 
screening programme against the target are being evalu- 
ated for their potency and selectivity. The inhibitors can 
encompass small molecules, antisense nucleic acid con- 
structs, dominant-negative proteins, or neutralizing anti- 
bodies microinjected into cells. In each case, proteome 
analysis can provide unique data in support of validation 
studies for a chosen candidate drug target. 

Proteomics and drug mode-of-action studies 

Once a validated target is committed to a screening regi- 
men to identify and advance a lead molecule, it is impor- 
tant to confirm that the efficacy of the inhibitor is through 
the expected mechanism. Such mode-of-action studies are 
usually tackled by various cell biological and biochemical 
methods. Proteomics can also be usefully applied to these 
studies and this is illustrated below by describing data ob- 
tained with OGT719- This is a novel galactosyl derivative of 
the cytotoxic agent 5-fluorouracil (5-FU), which is currently 
being developed by OGS for the treatment of hepatocel- 
lular carcinoma and colorectal metastases localized 
in the liver. The premise underpinning the design and ra- 
tionale of OGT719 was to derive a 5-FU prodrug capable 
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Figure 4. Features that are specifically up- or downregulated in Huh 7 cells by either 5-fluorouracil (5-FU) or 
OGT719: (a) elongation factor la2, (b) novel (three peptides by MS-MS) and (c) a-subunit of prolyl-4-hydroocylase. 
Arrows indicate up- or downregulated. 



of targeting, and being retained in, cells bearing the asialo- 
glycoprotein receptor (ASGP-r), including hepatocytes 29 , 
hepatoma Huh7 cells 30 and some colorectal tumour cells 31 . 
The growth of the human hepatoma cell line Huh7 is in- 
hibited by 5-FU or by OGT719- If the inhibition by 
OGT719 were the result of uptake and conversion to 5-FU 
as the active component, then it would be expected that 
Huh7 cells would show similar proteome profiles follow- 
ing exposure to either drug. 

To examine these possibilities, we conducted an experi- 
ment taking samples of Huh7 cells that had been treated 
with IC 50 doses of either OGT719 or 5-FU. Total cell lysates 
were prepared and taken through 2D electrophoresis, 
fluorescence staining, digital imaging and Proteograph 
analysis. To facilitate the interpretation of the data across 
all of the 2291 features seen on the proteomes, drug- 
induced protein changes of fivefold or greater, identified 
by the Proteograph, were analysed further. Interestingly, 
from this analysis 19 identical proteins were changed five- 
fold or more by both drugs, strongly suggesting similarities 
in the mode of action for these two compounds. 

Thus, from very complex data involving >2000 protein 
features, using proteomics it is possible to analyse quanti- 
tatively and qualitatively each protein during its exposure 
to drugs. The biologist is now able to focus a series of fur- 
ther studies specifically on an enriched subset of proteins. 



Figure 4 shows highlighted examples of the selected areas 
of the proteome where some of these identified proteins in 
the above study are altered in response to either or both 
drugs. 

Several of the proteins identified above as being modu- 
lated similarly by 5-FU or OGT719 in Huh7 cells were sub- 
jected to tandem mass-spectrometric analysis for anno- 
tation. Some of these, such as the nuclear ribosomal 
RNA-binding protein 32 , can be placed into pyrimidine 
pathways or related cell cycle/growth biochemical path- 
ways in which 5-FU is known to act. 

To attribute further significance to the proteome mode- 
of-action studies with OGT719, another cell line, the rat 
sarcoma HSN, was used. Growth of these cells is inhibited 
by 5-FU, but they are completely refractory to OGT719; 
notably they lack the ASGP-r, which might explain this 
finding (unpublished). For our proteome studies, HSN 
cells were treated with 5-FU or OGT719 over a time course 
of one, two and four days. At each time point, cells were 
harvested and processed to derive proteomes and 
Proteographs. As before, we purposely focused on those 
proteins that increased or decreased by fivefold or more. 
In this instance, there were no proteins co-modulated by 
the two drugs. This is perhaps to be expected, given that 
the HSN cells are killed by 5-FU and yet are refractory to 
OGT719. 
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Clear potential 

The above is just an example of how proteomics can be 
used to address the mode of action of anticancer drugs. 
The potential of this approach is clear, and one can envis- 
age situations where it will be profitable to compare the 
proteomes of cells in which the drug target has been elimi- 
nated by molecular knockout techniques, or with small- 
molecule inhibitors believed to act specifically on the same 
target. In addition to using proteomics to examine the ac- 
tion of drugs, it is also possible to use this approach to 
gauge the extent of nonspecific effects that might eventu- 
ally lead to toxicity. For instance, in the example used 
above with HSN cells treated with OGT719, although cell 
growth was not affected, the levels of several specific pro- 
teins were changed. Further investigation of these proteins 
and the signalling pathways in which they are involved 
could be illuminating in predicting the likelihood or other- 
wise of long-term toxicity. 

Us of proteomics in formal drug 
toxicology studies 

A drug discovery programme at the stage where leads 
have been identified and mode-of-action studies are ad- 
vanced, will proceed to investigate the pharmacokinetic 
and toxicology profile of those agents. These two param- 
eters are of major importance in the drug discovery 
process, and many agents that have looked highly promis- 
ing from in vitro studies have subsequently failed because 
of insurmountable pharmacokinetic and/or toxicity prob- 
lems in vivo. Whereas the pharmacokinetic properties of a 
molecule can now be characterized quickly and accu- 
rately, toxicity studies are typically much longer and more 
demanding in their interpretation. 

The ability to achieve fast and accurate predictions of 
toxicity within an in vivo setting would represent a big 
step forward in accelerating any drug discovery pro- 
gramme. Toxicity from a drug can be manifested in any 
organ. However, because the liver and kidney are the 
major sites in the body responsible for metabolism and 
elimination of most drugs, it is informative to examine 
these particular organs in detail to provide early indi- 
cations about events that might result in toxicity. 

The basis for most xenobiotic metabolizing activity is to 
increase the hydrophilicity of the compound and so facili- 
tate its removal from the body. Most drugs are metabo- 
lized in the liver via the cytochrome P450 family of en- 
zymes, which are known to comprise a total of -200 
different members 33,34 , encompassing a wide array of 
overlapping specificities for different substrates. In addi- 
tion to clearance, they also play a major role in metabo- 




lism that can lead to the production and removal of toxic 
species, and in some instances it is possible to correlate 
the ability or failure to remove such a toxin with a specific 
P450 or subgroup. 

Unique P450 profiles 

Each individual person will have a slightly different P450 
profile, largely from polymorphisms and changes in ex- 
pression levels, although other genetic and environmental 
factors aside from P450 also need to be taken into consid- 
eration. A significant amount of research is currently 
being directed towards this field - known as pharmacoge- 
nomics - with the aim of predicting how a patient will re- 
spond to a drug, as determined by their genetic make- 
up 35-37 . The marked variation of individuals in their ability 
to clear a compound can be one of the key factors in de- 
ciding the overall pharmacokinetic profile of a drug. Not 
only will this have a bearing on the likelihood of a patient 
responding to a treatment, but it will also be a factor in 
determining the possibility of their experiencing an ad- 
verse effect. 

Many pharmaceutical companies are already employing 
genomic approaches, involving P450 measurements, as a 
key step in their assessment of the toxicological profile of 
a candidate drug and therefore of its suitability, or other- 
wise, to be considered for human clinical trials. There are 
limits to this approach, however. Whereas the P450 mRNA 
profiling can predict with some accuracy the likely meta- 
bolic fate of a drug, it will not provide information on 
whether the metabolites would subsequently lead to tox- 
icity. Besides the patient-to-patient differences in steady- 
state levels of the P450s, there are also characteristic induc- 
tion responses of these enzymes to some drugs. Moreover, 
as there can be some doubt over the correlation of mRNA 
levels and the corresponding protein levels, there is scope 
for misinterpretation of the results and hence real advan- 
tages to be gained from a proteome approach. In both in- 
stances, the ability to examine entire proteome profiles, in- 
cluding the P450 proteins, will be a significant advantage 
in understanding and predicting the metabolism and 
toxicological outcome of drugs. 

In addition to direct organ and tissue studies, the serum, 
which collects the majority of toxicity markers released 
from susceptible organs and tissues throughout the entire 
body, can be utilized. Serum is rich in nuclease activity 
and, as pharmacogenomics is not suited to deal with these 
samples, valuable markers of toxicity could go undetected. 
However, by using proteomics for these types of analyses, 
serum markers (and clusters thereof) are now accessible 
for evaluation as indicators of toxicity. 
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Pharmacoproteomics 

Proteomics can thus be used to add a new sphere of 
analysis to the study of toxicity at the protein level, and in 
the era of '-omics' there is a case to be made to adopt the 
term 'Pharmacoproteomics™ 1 . Animals can be dosed with 
increasing levels of an experimental drug over time, and 
serum samples can be drawn for consecutive proteome 
analyses. Using this procedure, it should be possible to 
identify individual markers, or clusters thereof, that are 
dose related and correlate with the emergence and severity 
of toxicity. Markers might appear in the serum at a defined 
drug dose and time that are predictive of early toxicity 
within certain organs and if allowed to continue will have 
damaging consequences. These serum markers could sub- 
sequently be used to predict the response of each individ- 
ual and allow tailoring of therapy whereby optimal effi- 
cacy is achieved without adverse side effects being 
apparent. This application can obviously extend to track- 
ing toxicity of drugs in clinical trials where serum can be 
readily drawn and analysed. Surrogate markers for drug ef- 
ficacy could also be detected by this procedure and could 
facilitate the challenge of identifying patient classes who 
will respond favourably to a drug and at what dosage. 

C nclusions 

By contrast to the agents administered to patients in clini- 
cal wards, the process of drug discovery is not a prescrip- 
tive series of steps. The risks are high and there are long 
timelines to be endured before it is known whether a can- 
didate drug will succeed or fail. At each step of the drug 
discovery process there is often scope for flexibility in in- 
terpretation, which over many steps is cumulative. The 
pharmaceutical companies most likely to succeed in this 
environment are those that are able to make informed 
accurate decisions within an accelerated process. 

The genomics revolution has impacted very positively 
upon these issues and now has a powerful new partner in 
proteomics. The ability to undertake global analysis of pro- 
teins from a very wide diversity of biological systems and 
to interrogate these in a high-throughput, systematic man- 
ner will add a significant new dimension to drug discov- 
ery. Each step of the process from target discovery to clini- 
cal trials is accessible to proteomics, often providing 
unique sets of data. Using the combination of genomics 
and proteomics, scientists can now see every dimension of 
their biological focus, from genes, mRNA, proteins and 
their subcellular localization. This will greatly assist our 
understanding of the fundamental mechanistic basis of 
human disease and allow new improved and speedier 
drug discovery strategies to be implemented. 
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ABSTRACT Pairwise sequence comparison methods have 
been assessed using proteins whose relationships are known 
reliably from their structures and functions, as described in 
the SCOP database [Murzin, A. G., Brenner, S. E., Hubbard, T. 
& Chothia C. (1995) /. MoL Biol. 247, 536-540]. The evalua- 
tion tested the programs BLAST [Altschul, S. F., Gish, W., 
Miller, W., Myers, E. W. & Lipman, D. J. (1990). /. MoL Biol. 
215, 403-410], WU-BLAST2 [Altschul, S. F. & Gish, W. (1996) 
Methods EnzymoL 266, 460-480], FASTA [Pearson, W. R. & 
Lipman, D. J. (1988) Proc. Natl. Acad. Sci. USA 85, 2444-2448], 
and s search [Smith, T. F. & Waterman, M. S. (1981) /. Mol. 
Biol. 147, 195-197] and their scoring schemes. The error rate 
of all algorithms is greatly reduced by using statistical scores 
to evaluate matches rather than percentage identity or raw 
scores. The E-value statistical scores of SSEARCH and FASTA are 
reliable: the number of false positives found in our tests agrees 
well with the scores reported. However, the P-values reported 
by BLAST and WU-BLAST2 exaggerate significance by orders of 
magnitude. SSEARCH, fasta ktup = 1, and \vu-BLAST2 perform 
best, and they are capable of detecting almost all relationships 
between proteins whose sequence identities are >30%. For 
more distantly related proteins, they do much less well; only 
one-half of the relationships between proteins with 20-30% 
identity are found. Because many homologs have low sequence 
similarity, most distant relationships cannot be detected by 
any pairwise comparison method; however, those which are 
identified may be used with confidence. 



Sequence database searching plays a role in virtually every 
branch of molecular biology and is crucial for interpreting the 
sequences issuing forth from genome projects. Given the 
method's centra! role, it is surprising that overall and relative 
capabilities of different procedures are largely unknown. It is 
difficult to verify algorithms on sample data because this 
requires large data sets of proteins whose evolutionary rela- 
tionships are known unambiguously and independently of the 
methods being evaluated. However, nearly all known ho- 
mologs have been identified by sequence analysis (the method 
to be tested). Also, it is generally very difficult to know, in the 
absence of structural data, whether two proteins that lack clear 
sequence similarity are unrelated. This has meant that al- 
though previous evaluations have helped improve sequence 
comparison, they have suffered from insufficient, imperfectly 
characterized, or artificial test data. Assessment also has been 
problematic because high quality database sequence searching 
attempts to have both sensitivity (detection of homologs) and 
specificity (rejection of unrelated proteins); however, these 
complementary goals are linked such that increasing one 
causes the other to be reduced. 
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Sequence comparison methodologies have evolved rapidly, 
so no previously published tests has evaluated modern versions 
of programs commonly used. For example, parameters in 
BLAST (1) have changed, and WU-BLAST2 (2) — which produces 
gapped alignments — has become available. The latest version 
of fasta (3) previously tested was 1.6, but the current release 
(version 3.0) provides fundamentally different results in the 
form of statistical scoring. 

The previous reports also have left gaps in our knowledge. 
For example, there has been no published assessment of 
thresholds for scoring schemes more sophisticated than per- 
centage identity. Thus, the widely discussed statistical scoring 
measures have never actually been evaluated on large data- 
bases of real proteins. Moreover, the different scoring schemes 
commonly in use have not been compared. 

Beyond these issues, there is a more fundamental question: 
in an absolute sense, how well does pairwise sequence com- 
parison work? That is, what fraction of homologous proteins 
can be detected using modern database searching methods? 

In this work, we attempt to answer these questions and to 
overcome both of the fundamental difficulties that have hin- 
dered assessment of sequence comparison methodologies. 
First, we use the set of distant evolutionary relationships in the 
scop: Structural Classification of Proteins database (4), which 
is derived from structural and functional characteristics (5). 
The SCOP database provides a uniquely reliable set of ho- 
mologs, which are known independently of sequence compar- 
ison. Second, we use an assessment method that jointly mea- 
sures both sensitivity and specificity. This method allows 
straightforward comparison of different sequence searching 
procedures. Further, it can be used to aid interpretation of real 
database searches and thus provide optimal and reliable 
results. 

Previous Assessments of Sequence Comparison. Several 
previous studies have examined the relative performance of 
different sequence comparison methods. The most encom- 
passing analyses have been by Pearson (6, 7), who compared 
the three most commonly used programs. Of these, the Smith- 
Waterman algorithm (8) implemented in SSEARCH (3) is the 
oldest and slowest but the most rigorous. Modern heuristics 
have provided blast (1) the speed and convenience to make 
it the most popular program. Intermediate between these two 
is FASTA (3), which may be run in two modes offering either 
greater speed (ktup = 2) or greater effectiveness (ktup = 1). 
Pearson also considered different parameters for each of these 
programs. 

To test the methods, Pearson selected two representative 
proteins from each of 67 protein superfamilies defined by the 
PlR database (9). Each was used as a query to search the 
database, and the matched proteins were marked as being 
homologous or unrelated according to their membership of PlR 
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superfamilies. Pearson found that modern matrices and "In- 
scaling" of raw scores improve results considerably. He also 
reported that the rigorous Smith-Waterman algorithm worked 
slightly better than fasta, which was in turn more effective 
than blast. 

Very large scale analyses of matrices have been performed 
(10), and Henikoff and Henikoff (11) also evaluated the 
effectiveness of blast and fasta. Their test with BLAST 
considered the ability to detect homologs above a predeter- 
mined score but had no penalty for methods which also 
reported large numbers of spurious matches. The Henikoffs 
searched the swiss-prot database (12) and used prosite (13) 
to define homologous families. Their results showed that the 
BLOSUM62 matrix (14) performed markedly better than the 
extrapolated PAM-series matrices (15), which previously had 
been popular. 

A crucial aspect of any assessment is the data that are used 
to test the ability of the program to find homologs. But in 
Pearson's and the Henikoffs' evaluations of sequence com- 
parison, the correct results were effectively unknown. This is 
because the superfamilies in pir and prosite are principally 
created by using the same sequence comparison methods 
which are being evaluated. Interdependency of data and 
methods creates a "chicken and egg" problem, and means for 
example, that new methods would be penalized for correctly 
identifying homologs missed by older programs. For instance, 
immunoglobulin variable and constant domains are clearly 
homologous, but pir places them in different superfamilies. 
The problem is widespread: each superfamily in PIR 48.00 with 
a structural homolog is itself homologous to an average of 1.6 
other PIR superfamilies (16). 

To surmount these sorts of difficulties, Sander and Schnei- 
der (17) used protein structures to evaluate sequence com- 
parison. Rather than comparing different sequence compari- 
son algorithms, their work focused on determining a length- 
dependent threshold of percentage identity, above which all 
proteins would be of similar structure. A result of this analysis 
was the HSSP equation; it states that proteins with 25% identity 
over 80 residues will have similar structures, whereas shorter 
alignments require higher identity. (Other studies also have 
used structures (18-20), but these focused on a small number 
of model proteins and were principally oriented toward eval- 
uating alignment accuracy rather than homology detection.) 

A general solution to the problem of scoring comes from 
statistical measures (i.e., E-values and P-values) based on the 
extreme value distribution (21). Extreme value scoring was 
implemented analytically in the blast program using the 
Karlin and Altschul statistics (22, 23) and empirical ap- 
proaches have been recently added to FASTA and SSEARCH. In 
addition to being heralded as a reliable means of recognizing 
significantly similar proteins (24, 25), the mathematical trac- 
tability of statistical scores "is a crucial feature of the blast 
algorithm" (1). The validity of this scoring procedure has been 
tested analytically and empirically (see ref. 2 and references in 
ref. 24). However, all large empirical tests used random 
sequences that may lack the subtle structure found within 
biological sequences (26, 27) and obviously do not contain any 
real homologs. Thus, although many researchers have sug- 
gested that statistical scores be used to rank matches (24, 25, 
28), there have been no large rigorous experiments on biolog- 
ical data to determine the degree to which such rankings are 
superior. 

A Database for Testing Homology Detection. Since the 
discovery that the structures of hemoglobin and myoglobin are 
very similar though their sequences are not (29), it has been 
apparent that comparing structures is a more powerful (if less 
convenient) way to recognize distant evolutionary relation- 
ships than comparing sequences. If two proteins show a high 
degree of similarity in their structural details and function, it 
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is very probable that they have an evolutionary relationship 
though their sequence similarity may be low. 

The recent growth of protein structure information com- 
bined with the comprehensive evolutionary classification in 
the SCOP database (4, 5) have allowed us to overcome previous 
limitations. With these data, we can evaluate the performance 
of sequence comparison methods on real protein sequences 
whose relationships are known confidently. The SCOP database 
uses structural information to recognize distant homologs, the 
large majority of which can be determined unambiguously. 
These superfamilies, such as the globins or the immunoglobu- 
lins, would be recognized as related by the vast majority of the 
biological community despite the lack of high sequence sim- 
ilarity. 

From SCOP, we extracted the sequences of domains of 
proteins in the Protein Data Bank (pdb) (30) and created two 
databases. One (PDB90D-B) has domains, which were all <90% 
identical to any other, whereas (PDB40D-B) had those <40% 
identical. The databases were created by first sorting all 
protein domains in SCOP by their quality and making a list. The 
highest quality domain was selected for inclusion in the 
database and removed from the list. Also removed from the list 
(and discarded) were all other domains above the threshold 
level of identity to the selected domain. This process was 
repeated until the list was empty. The PDB40D-B database 
contains 1,323 domains, which have 9,044 ordered pairs of 
distant relationships, or ***0.5% of the total 1,749,006 ordered 
pairs. In PDB90D-B, the 2,079 domains have 53,988 relation- 
ships, representing 1.2% of all pairs. Low complexity regions 
of sequence can achieve spurious high scores, so these were 
masked in both databases by processing with the SEG program 
(27) using recommended parameters: 12 1.8 2.0. The databases 
used in this paper are available from http://sss.stanford.edu/ 
sss/, and databases derived from the current version of SCOP 
may be found at http://scop.mrc-lmb.cam.ac.uk/scop/. 

Analyses from both databases were generally consistent, but 
PDB40D-B focuses on distantly related proteins and reduces the 
heavy overrepresentation in the PDB of a small number of 
families (31, 32), whereas PDB90D-B (with more sequences) 
improves evaluations of statistics. Except where noted other- 
wise, the distant homolog results here are from PDB40D-B. 
Although the precise numbers reported here are specific to the 
structural domain databases used, we expect the trends to be 
general. 

Assessment Data and Procedure. Our assessment of se- 
quence comparison may be divided into four different major 
categories of tests. First, using just a single sequence compar- 
ison algorithm at a time, we evaluated the effectiveness of 
different scoring schemes. Second, we assessed the reliability 
of scoring procedures, including an evaluation of the validity 
of statistical scoring. Third, we compared sequence compari- 
son algorithms (using the optimal scoring scheme) to deter- 
mine their relative performance. Fourth, we examined the 
distribution of homologs and considered the power of pairwise 
sequence comparison to recognize them. All of the analyses 
used the databases of structurally identified homologs and a 
new assessment criterion. 

The analyses tested blast (1), version L4.9MP, and wu- 
BLAST2 (2), version 2.0al3MP. Also assessed was the fasta 
package, version 3.0t76 (3), which provided FASTA and the 
ssearch implementation of Smith-Waterman (8). For 
ssearch and fasta, we used BLOSUM45 with gap penalties 
— 12/— 1 (7, 16). The default parameters and matrix (BLO- 
SUM62) were used for BLAST and WU-BLAST2. 

The "Coverage Vs. Error" Plot. To test a particular protocol 
(comprising a program and scoring scheme), each sequence 
from the database was used as a query to search the database. 
This yielded ordered pairs of query and target sequences with 
associated scores, which were sorted, on the basis of their 
scores, from best to worst. The ideal method would have 
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Fig. 1. Coverage vs. error plots of different scoring schemes for ssearch Smith-Waterman. (A) Analysis of PDB40D-B database. (B) Analysis 
of PDB90D-B database. All of the proteins in the database were compared with each other using the ssearch program. The results of this single 
set of comparisons were considered using five different scoring schemes and assessed. The graphs show the coverage and errors per query (EPQ) 
for statistical scores, raw scores, and three measures using percentage identity. In the coverage vs. error plot, the x axis indicates the fraction of 
all homologs in the database (known from structure) which have been detected. Precisely, it is the number of detected pairs of proteins with the 
same fold divided by the total number of pairs from a common superfamily. PDB40D-B contains a total of 9,044 homologs, so a score of 10% indicates 
identification of 904 relationships. The y axis reports the number of EPQ. Because there are 1,323 queries made in the PDB40D-B all-vs.-all 
comparison, 13 errors corresponds to 0.01, or 1% EPQ. They axis is presented on a log scale to show results over the widely varying degrees of 
accuracy which may be desired. The scores that correspond to the levels of EPQ and coverage are shown in Fig. 4 and Table 1. The graph 
demonstrates the trade-off between sensitivity and selectivity. As more homologs are found (moving to the right), more errors are made (moving 
up). The ideal method would be in the lower right corner of the graph, which corresponds to identifying many evolutionary relationships without 
selecting unrelated proteins. Three measures of percentage identity are plotted. Percentage identity within alignment is the degree of identity within 
the aligned region of the proteins, without consideration of the alignment length. Percentage identity within both is the number of identical residues 
in the aligned region as a percentage of the average length of the query and target proteins. The HSSP equation (17) is H = 290.15/" 0 - 562 where 
/ is length for 10 < / < 80; H > 100 for / < 10; H = 24.7 for / > 80. The percentage identity HSSP-adjusted score is the percent identity within 
the alignment minus H. Smith-Waterman raw scores and E-values were taken directly from the sequence comparison program. 



perfect separation, with all of the homologs at the top of the 
list and unrelated proteins below. In practice, perfect separa- 
tion is impossible to achieve so instead one is interested in 
drawing a threshold above which there are the largest number 
of related pairs of sequences consistent with an acceptable 
error rate. 

Our procedure involved measuring the coverage and error 
for every threshold. Coverage was defined as the fraction of 
structurally determined homologs that have scores above the 
selected threshold; this reflects the sensitivity of a method. 
Errors per query (EPQ), an indicator of selectivity, is the 
number of nonhomologous pairs above the threshold divided 
by the number of queries. Graphs of these data, called 
coverage vs. error plots, were devised to understand how 



protocols compare at different levels of accuracy. These 
graphs share effectively all of the beneficial features of Re- 
ciever Operating Characteristic (ROC) plots (33, 34) but 
better represent the high degrees of accuracy required in 
sequence comparison and the huge background of nonho- 
mologs. 

This assessment procedure is directly relevant to practical 
sequence database searching, for it provides precisely the 
information necessary to perform a reliable sequence database 
search. The EPQ measure places a premium on score consis- 
tency; that is, it requires scores to be comparable for different 
queries. Consistency is an aspect which has been largely 

Percent Identity of Unrelated Proteins (PDB90D-B) 
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Fig. 2. Unrelated proteins with high percentage identity. Hemo- 
globin /3-chain (pdb code lhds chain b, ref. 38, Left) and cellulase E2 
(pdb code Itml, ref. 39, Right) have 39% identity over 64 residues, a 
level which is often believed to be indicative of homology. Despite this 
high degree of identity, their structures strongly suggest that these 
proteins are not related. Appropriately, neither the raw alignment 
score of 85 nor the E-value of 1.3 is significant. Proteins rendered by 
RASMOL (40). 
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Fig. 3. Length and percentage identity of alignments of unrelated 
proteins in PDB90D-B: Each pair of nonhomologous proteins found with 
ssearch is plotted as a point whose position indicates the length and 
the percentage identity within the alignment. Because alignment 
length and percentage identity are quantized, many pairs of proteins 
may have exactly the same alignment length and percentage identity. 
The line shows the hssp threshold (though it is intended to be applied 
with a different matrix and parameters). 
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Fig. 4. Reliability of statistical scores in PDB90D-B: Each line shows 
the relationship between reported statistical score and actual error 
rate for a different program. E-values are reported for ssearch and 
fasta, whereas P-values are shown for blast and WU-BLAST2. If the 
scoring were perfect, then the number of errors per query and the 
E-values would be the same, as indicated by the upper bold line. 
(P-values should be the same as EPQ for small numbers, and diverges 
at higher values, as indicated by the lower bold line.) E-values from 
ssearch and fasta are shown to have good agreement with EPQ but 
underestimate the significance slightly, blast and wu-blast2 are 
overconfident, with the degree of exaggeration dependent upon the 
score. The results for PDB40D-B were similar to those for PDB90D-B 
despite the difference in number of homologs detected. This graph 
could be used to roughly calibrate the reliability of a given statistical 
score. 

ignored in previous tests but is essential for the straightforward 
or automatic interpretation of sequence comparison results. 
Further, it provides a clear indication of the confidence that 
should be ascribed to each match. Indeed, the EPQ measure 
should approximate the expectation value reported by data- 
base searching programs, if the programs' estimates are accu- 
rate. 

The Performance of Scoring Schemes. All of the programs 
tested could provide three fundamental types of scores. The 
first score is the percentage identity, which may be computed 
in several ways based on either the length of the alignment or 
the lengths of the sequences. The second is a "raw" or 
"Smith-Waterman" score, which is the measure optimized by 
the Smith-Waterman algorithm and is computed by summing 
the substitution matrix scores for each position in the align- 
ment and subtracting gap penalties. In blast, a measure 



related to this score is scaled into bits. Third is a statistical 
score based on the extreme value distribution. These results 
are summarized in Fig. 1. 

Sequence Identity. Though it has been long established that 
percentage identity is a poor measure (35), there is a common 
rule-of-thumb stating that 30% identity signifies homology. 
Moreover, publications have indicated that 25% identity can 
be used as a threshold (17, 36). We find that these thresholds, 
originally derived years ago, are not supported by present 
results. As databases have grown, so have the possibilities for 
chance alignments with high identity; thus, the reported cutoffs 
lead to frequent errors. Fig. 2 shows one of the many pairs of 
proteins with very different structures that nonetheless have 
high levels of identity over considerable aligned regions. 
Despite the high identity, the raw and the statistical scores for 
such incorrect matches are typically not significant. The prin- 
cipal reasons percentage identity does so poorly seem to be 
that it ignores information about gaps and about the conser- 
vative or radical nature of residue substitutions. 

From the PDB90D-B analysis in Fig. 3, we learn that 30% 
identity is a reliable threshold for this database only for 
sequence alignments of at least 150 residues. Because one 
unrelated pair of proteins has 43,5% identity over 62 residues, 
it is probably necessary for alignments to be at least 70 residues 
in length before 40% is a reasonable threshold, for a database 
of this particular size and composition. 

At a given reliability, scores based on percentage identity 
detect just a fraction of the distant homologs found by 
statistical scoring. If one measures the percentage identity in 
the aligned regions without consideration of alignment length, 
then a negligible number of distant homologs are detected. 
Use of the hssp equation improves the value of percentage 
identity, but even this measure can find only 4% of all known 
homologs at 1% EPQ. In short, percentage identity discards 
most of the information measured in a sequence comparison. 

Raw Scores. Smith-Waterman raw scores perform better 
than percentage identity (Fig. 1), but ln-scaling (7) provided no 
notable benefit in our analysis. It is necessary to be very precise 
when using either raw or bit scores because a 20% change in 
cutoff score could yield a tenfold difference in EPQ. However, 
it is difficult to choose appropriate thresholds because the 
reliability of a bit score depends on the lengths of the proteins 
matched and the size of the database. Raw score thresholds 
also are affected by matrix and gap parameters. 

Statistical Scores. Statistical scores were introduced partly 
to overcome the problems that arise from raw scores. This 
scoring scheme provides the best discrimination between 
homologous proteins and those which are unrelated. Most 
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Fig. 5. Coverage vs. error plots of different sequence comparison methods: Five different sequence comparison methods are evaluated, each 
using statistical scores (E- or P-values). (A) PDB40D-B database. In this analysis, the best method is the slow ssearch, which finds 18% of relationships 
at 1% EPQ. fasta ktup = 1 and wu-blasT2 are almost as good. (B) pdbwd-b database. The quick wu-blast2 program provides the best coverage 
at 1% EPQ on this database, although at higher levels of error it becomes slightly worse than fasta ktup = 1 and ssearch. 
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likely, its power can be attributed to its incorporation of more 
information than any other measure; it takes account of the 
full substitution and gap data (like raw scores) but also has 
details about the sequence lengths and composition and is 
scaled appropriately. 

We find that statistical scores are not only powerful, but also 
easy to interpret, ssearch and fasta show close agreement 
between statistical scores and actual number of errors per 
query (Fig. 4). The expectation value score gives a good, 
slightly conservative estimate of the chances of the two se- 
quences being found at random in a given query. Thus, an 
E-value of 0.01 indicates that roughly one pair of nonhomologs 
of this similarity should be found in every 100 different queries. 
Neither raw scores nor percentage identity can be interpreted 
in this way, and these results validate the suitability of the 
extreme value distribution for describing the scores from a 
database search. 

The P -values from blast also should be directly interpret- 
able but were found to overstate significance by more than two 
orders of magnitude for 1% EPQ for this database. Nonethe- 
less, these results strongly suggest that the analytic theory is 
fundamentally appropriate. WU-BLAST2 scores were more re- 
liable than those from blast, but also exaggerate expected 
confidence by more than an order of magnitude at 1% EPQ. 

Overall Detection of Homologs and Comparison of Algo- 
rithms. The results in Fig. 5A and Table 1 show that pairwise 
sequence comparison is capable of identifying only a small 
fraction of the homologous pairs of sequences in PDB40D-B. 
Even ssearch with E-values, the best protocol tested, could 
find only 18% of all relationships at a 1% EPQ. blast, which 
identifies 15%, was the worst performer, whereas fasta 
ktup = 1 is nearly as effective as ssearch. fasta ktup = 2 and 
WU-BLAST2 are intermediate in their ability to detect ho- 
mologs. Comparison of different algorithms indicates that 
those capable of identifying more homologs are generally 
slower, ssearch is 25 times slower than blast and 6.5 times 
slower than fasta ktup = 1. WU-BLAST2 is slightly faster than 
fasta ktup = 2, but the latter has more interpretable scores. 

In PDB90D-B, where there are many close relationships, the 
best method can identify only 38% of structurally known 
homologs (Fig. 5B). The method which finds that many 
relationships is WU-BLAST2. Consequently, we infer that the 
differences between fasta kup = 1, ssearch, and WU-BLAST2 
programs are unlikely to be significant when compared with 
variation in database composition and scoring reliability. 

Fig. 6 helps to explain why most distant homologs cannot be 
found by sequence comparison: a great many such relation- 
ships have no more sequence identity than would be expected 
by chance. SSEARCH with E-values can recognize >90% of the 
homologous pairs with 30-40% identity. In this region, there 
are 30 pairs of homologous proteins that do not have signif- 
icant E-values, but 26 of these involve sequences with <50 
residues. Of sequences having 25-30% identity, 75% are 
identified by ssearch E-values. However, although the num- 
ber of homologs grows at lower levels of identity, the detection 
falls off sharply: only 40% of homologs with 20-25% identity 
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Fig. 6. Distribution and detection of homologs in PDB40D-B. Bars 
show the distribution of homologous pairs PDB40D-B according to their 
identity (using the measure of identity in both). Filled regions indicate 
the number of these pairs found by the best database searching method 
(ssearch with E-values) at 1% EPQ. The PDB40D-B database contains 
proteins with <40% identity, and as shown on this graph, most 
structurally identified homologs in the database have diverged ex- 
tremely far in sequence and have <20% identity. Note that the 
alignments may be inaccurate, especially at low levels of identity. Filled 
regions show that ssearch can identify most relationships that have 
25% or more identity, but its detection wanes sharply below 25%. 
Consequently, the great sequence divergence of most structurally 
identified evolutionary relationships effectively defeats the ability of 
pariwise sequence comparison to detect them. 

are detected and only 10% of those with 15-20% can be found. 
These results show that statistical scores can find related 
proteins whose identity is remarkably low; however, the power 
of the method is restricted by the great divergence of many 
protein sequences. 

After completion of this work, a new version of pairwise 
BLAST was released: blastgp (37). It supports gapped align- 
ments, like WU-BLAST2, and dispenses with sum statistics. Our 
initial tests on BLASTGP using default parameters show that its 
E-values are reliable and that its overall detection of homologs 
was substantially better than that of ungapped blast, but not 
quite equal to that of WU-B1AST2. 

CONCLUSION 

The general consensus amongst experts (see refs. 7, 24, 25, 27 
and references therein) suggests that the most effective se- 
quence searches are made by (/) using a large current database 
in which the protein sequences have been complexity masked 
and (ii) using statistical scores to interpret the results. Our 
experiments fully support this view. 

Our results also suggest two further points. First, the E-val- 
ues reported by FASTA and SSEARCH give fairly accurate 
estimates of the significance of each match, but the P-values 
provided by blast and WU-B1AST2 underestimate the true 



Table 1. Summary of sequence comparison methods with PDB40D-B 



Method 


Relative Time* 


1% EPQ Cutoff 


Coverage at 1% EPQ 


ssearch % identity: within alignment 


25.5 


>70% 


<0.1 


ssearch % identity: within both 


25.5 


34% 


3.0 


ssearch % identity: HSSP-scaled 


25.5 


35% (hssp + 9.8) 


4.0 


ssearch Smith-Waterman raw scores 


25.5 


142 


10.5 


ssearch E-values 


25.5 


0.03 


18.4 


fasta ktup = 1 E-values 


3.9 


0.03 


17.9 


fasta ktup = 2 E-values 


1.4 


0.03 


16.7 


WU-BLAST2 P-values 


1.1 


0.003 


17.5 


blast P-values 


1.0 


0.00016 


14.8 


*Times are from large database searches with genome proteins. 
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extent of errors. Second, ssearch, wu-blast2, and fasta 
ktup = 1 perform best, though BLAST and fasta ktup = 2 
detect most of the relationships found by the best procedures 
and are appropriate for rapid initial searches. 

The homologous proteins that are found by sequence com- 
parison can be distinguished with high reliability from the huge 
number of unrelated pairs. However, even the best database 
searching procedures tested fail to find the large majority of 
distant evolutionary relationships at an acceptable error rate. 
Thus, if the procedures assessed here fail to find a reliable 
match, it does not imply that the sequence is unique; rather, it 
indicates that any relatives it might have are distant ones.** 



** Additional and updated information about this work, including 
supplementary figures, may be found at http://sss.stanford.edu/sss/. 
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Annotation transfer is a principal process in genome annotation. It involves "transferring" structural and 
functional annotation to uncharacterized open reading frames (ORFs) in a newiy completed genome from 
experimentally characterized proteins similar in sequence. To prevent errors in genome annotation, it is 
important that this process be robust and statistically well-characterized, especially with regard to how it 
depends on the degree of sequence similarity. Previously, we and others have analyzed annotation transfer in 
single-domain proteins. Multi-domain proteins, which make up the bulk of the ORFs in eukaryotic genomes, 
present more complex issues in functional conservation. Here we present a large-scale survey of annotation 
transfer in these proteins, using scop superfamilies to define domain folds and a thesaurus based on 
SWISS-PROT keywords to define functional categories. Our survey reveals that multi-domain proteins have 
significantly less functional conservation than single-domain ones, except when they share the exact same 
combination of domain folds. In particular, we find that for multi-domain proteins, approximate function can be 
accurately transferred with only 35% certainty for pairs of proteins sharing one structural superfamily. In 
contrast, this value is 67% for pairs of single-domain proteins sharing the same structural superfamily. On the 
other hand, if two multi-domain proteins contain the same combination of two structural superfamilies the 
probability of their sharing the same function increases to 80% in the case of complete coverage along the full 
length of both proteins, this value increases further to > 90%. Moreover, we found that only 70 of the current 
total of 455 structural superfamilies are found in both single and multi-domain proteins and only 14 of these 
were associated with the same function in both categories of proteins. We also investigated the degree to which 
function could be transferred between pairs of multi-domain proteins with respect to the degree of sequence 
similarity between them, finding that functional divergence at a given amount of sequence similarity is always 
about two-fold greater for pairs of multklomain proteins (sharing similarity over a single domain) in 
comparison to pairs of single-domain ones, though the overall shape of the relationship is quite similar. Further 
information is available at http://partslist.org/func or http://bioinfo.mbb.yaIe.edu/partslist/func. 



The ultimate goal of the genome projects is to determine the 
structure and function of all the newly identified gene prod- 
ucts. Fundamentally, this will be carried out via annotation 
transfer, transferring the structural and functional annotation 
from an experimentally characterized protein (as in a model 
organism such as Escherichia coli) to a predicted protein in a 
newly sequenced genome that shares similarity in sequence. 
The degree of annotation transferred will depend on the de- 
gree of sequence similarity. This process is shown schemati- 
cally in Figure 1. In this paper, we aim to address this major 
question in bioinformatics, specifically focusing on multi- 
domain proteins, as they make up the bulk of the proteome in 
eukaryotic organisms (Gerstein 1998). 

Our work is a direct outgrowth of two previous analyses 
of ours that concentrated on single-domain proteins. In an 
earlier paper, we found that the different structural classes of 
the scop classification system have different propensities to 
carry out certain types of function (Hegyi and Gerstein 1999). 
In particular, while the alpha/beta folds were disproportion- 
ately associated with enzymes and all-alpha and small folds 
with non-enzymes, the alpha + beta structures had an equal 
tendency for both enzymatic and non-enzymatic functions. 
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Wilson et al. (2000) compared a large number of protein do- 
mains to one another in a pair-wise fashion with respect to 
similarities in sequence, structure, and function. Using a hy- 
brid functional classification scheme merging the ENZYME 
and FlyBase systems (Gelbart et al. 1997; Bairoch 2000), they 
found that precise function is not conserved below 30-40% 
identity, although the broad functional class is usually pre- 
served for sequence identities as low as 20-25%, given that 
the sequences have the same fold. Their survey also reinforced 
the previously established general exponential relationship 
between structural and sequence similarity (Chothia and Lesk 
1986). 

Other Work on Establishing Relationships between 
Sequence, Structure, and Function 

Several other groups have studied the relationship between 
sequence, structure, and function in detail, attempting to de- 
termine the extent to which functional transference between 
matching proteins is feasible (Shah and Hunger 1997; Martin 
et al. 1998; Thornton et al. 1999, 2000; Zhang et al. 1999; 
Shapiro and Harris 2000; Todd et al. 2001). Orengo et al. 
(1999) analyzed protein families in the CATH database and 
concluded that > 96% of the folds in the PDB are associated 
with a single homologous family. By investigating enzymatic 
folds they also found that more than 95% of homologous 
families show either single or closely related functions. 
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Figure 1 Schematic illustrating annotation transfer. This figure illustrates the process of annotation transfer for a group of hypothetical TIM barrel 
proteins. The leftmost panel represents sequence comparisons between idealized barrel domains from a number of organisms. The next panel 
shows analogous results for structural comparison, and the panel after that, functional comparison. The rightmost panel represents sequence 
comparisons between idealized multi-domain proteins that match over a single domain, the subject of much of this paper. 



Pawlowski et al. (2000) studied the relationship between se- 
quence and functional similarity in the twilight zone of 10%- 
15% sequence similarity and found a clear correlation be- 
tween the two, with functional similarity based on the E.C. 
classification of enzymes. 

Russell et al. (1997) analyzed binding sites in proteins 
with similar 3D structures and estimated that 90% of new 
remote homolog have common binding sites and similar 
functions. Eisenstein et al. (2000) evaluated the first results 
from the structural genomics projects and found that in many 
instances the protein structure itself offers an important clue 
to its biological function. Stawiski et al. (2000) found that 
function could be predicted rather successfully for just the 
proteases. Devos and Valencia (2000) presented a critical view 
of function transference between similar sequences, high- 
lighting the limitations of this process due to errors in data- 
bases and the inherent complexity of the relationship be- 
tween protein sequence-structure and function that does not 
allow "simplistic interpretations." They also found that bind- 
ing sites are the least conserved features between related pro- 
teins while the catalytic activity of enzymes is the most con- 
served one. 

Multi-Domain Proteins with Divergent Functions: 
How Common? 

Most of these previous investigations focused on single- 
domain proteins or did not distinguish between single- and 
multi-domain ones. It is not clear how the multi-domain pro- 
teins with various functions behave with respect to functional 
conservation; namely, whether they are more or less con- 
served than their single-domain counterparts. In particular, as 
shown in Figure 1, if one multi-domain protein shares a single 
domain fold with another one, it is not clear the degree to 
which the functional conservation of these proteins is con- 
strained by the shared part, and to what degree it is influenced 
by other domains that are not shared. 

Specific groups of proteins that have the same combina- 
tion of structural domains but dramatically different func- 
tions illustrate this situation. One example is the combination 



of the SH3-domain (scop superfamily identifier 2.24.2) and 
the P-loop containing NTP hydrolase (3.29.1). While in 
higher organisms this combination is associated with presyn- 
aptic and tumor suppressor functions (SWISS-PROT names 
SP02_HUMAN and DLGIJ3ROME, respectively), in the lower 
Dictyostelium it was found in myosin (MYSP_DICDI). An- 
other example is the combination of the FAD/NAD(P)- 
binding superfamily and FAD-linked reductases C-terminal 
superfamily (3.4.1 and 4.12.1 superfamilies, respectively). In 
one group of proteins they appear in enzymes of the oxido- 
reductase group (e.g. OXDA_CAEEL or PHHY_PSEAE), while 
in another they are found in a dissociation inhibitor (e.g. 
GDIA_HUMAN). It should be noted that the proteins are not 
covered completely by the structural matches, so it is quite 
possible that the rest of them contain totally different do- 
mains that are responsible for the dramatically different func- 
tions. However, do these two examples show a rather rare or 
a more frequent phenomenon? How often do multi-domain 
proteins, sharing the same structural domain composition, 
differ in their functions? 

In this paper, we attempt to provide a comprehensive 
answer to this question. This is particularly timely given that 
most of the unknown proteins in eukaryotic genomes are 
multi-domain. We use the same approach as in our previous 
analyses, comparing the sequences of the structural domains 
in scop to those of SWISS-PROT using blast P. We focus on 
the functional divergence of single and multi-domain pro- 
teins, extending previous investigations of single-domain 
proteins. Also, in comparison to previous work, we focus 
more on non-enzymatic functions and scop structural super- 
families, instead of folds. 

RESULTS 

Our Approach to Functional 
and Structural Assignment 

We used the blastp program (version 2.0) (Altschul et al. 
1997) to identify the scop 1.39 (Murzin et al. 1995) structural 
domains in SWISS-PROT (version 37) (Bairoch and Apweiler 
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2000) with e = 10 " 4 . We removed the hypothetical and frag- 
ment proteins. This resulted in two sets of proteins. 

Single-Domain 

Of the single-domain matches, only those that were almost 
completely covered with a match to a single structural do- 
main were selected! (The maximum number of uncovered 
residues was set at 70 with an additional condition that a 
maximum of 40 residues on the N-terminal end and 30 resi- 
dues on the C-terminus were allowed to be uncovered.) These 
criteria resulted in 1818 single-domain proteins being selected 
from SWISS-PROT. 

Multi-Domain 

We selected 4763 multi-domain proteins from SWISS-PROT. 
All of these matched (in different locations) at least two do- 
mains of known structure belonging to different scop super- 
families (see schematic in Figure 1). We also selected a subset 
of these proteins that have almost their entire length covered 
by matches with structural domains (allowing again a maxi- 
mum of 70 uncovered residues). This selection resulted in 
2829 proteins being selected from SWISS-PROT. (In all cases, 
duplicate matches were removed, i.e., a protein at a certain 
location matches only one structural domain.) 

We set out to compare these two sets of proteins for 
functional divergence. As previously, we divided functions 
into enzyme and non-enzyme (Hegyi and Gerstein 1999). En- 
zymatic functions were classified by the EC system (Bairoch 
2000). Comparisons of enzymatic functions were treated the 
same way as in our earlier analyses, that is, if they differ in the 
first three components of their respective EC numbers, they 
were considered different. This implied that our analysis dealt 
with a total of 112 enzymatic functions. Non-enzymatic func- 
tions were classified into 508 different categories based on a 
simple thesaurus we assembled of synonymous keywords 
drawn from SWISS-PROT description lines. In addition, we 
created 49 categories for functions that have an enzymatic 
component but which are not part of the EC system. This gave 
us a total of 669 functions (112 + 508 + 49). (The list of all the 
functional categories is described further in Table 2 below, 
and also can be found on the Web at http://bioinfo. 
mbb.yale.edu/partslist/func or http://partslist.org/func.) 

Overall Distribution of the Matches 

Figure 2 shows the most commonly observed multi-domain 
combinations in a set of recently sequenced genomes. The 
occurrences of further combinations are available from the 
Web site. Clearly, the distribution is very skewed, with certain 
combinations, such as 3.29-2.32, and 2.29-4.61 tending to 
predominate. 

Figure 3 shows the overall distribution of the single- 
domain and multi-domain matches in the different structural 
classes. The distribution of matches between enzymes and 
non-enzymes in multi -domain proteins largely agrees with 
that in the single-domain proteins. The multi-domain 
matches follow the overall tendency of the alpha/beta folds to 
be associated with enzymes to a larger extent and the all- 
alpha and small folds with non-enzymes. However, the values 
for the multi-domain matches are generally less extreme than 
for single-domains; for example, the 10-fold difference be- 
tween single-domain alpha/beta enzymes and non-enzymes 
decreases to about twofold in multi-domain proteins. Another 
significant difference is the reduction in the number of multi- 
domain non-enzymes in the all-beta and alpha + beta struc- 
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3.22 4.42 
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2.32 2.33 
4.32 3.1 

3.23 4.89 
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4.72 5.13 
3.22 4.1 

3.5 3.1 
4.61 3.42 
1.76 3.3 

4.29 4.1 
2.32 4.34 
3.22 1.79 
3.52 2.34 
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Figure 2 Distribution of multi-domain combinations amongst the 
genomes. The figure shows the occurrence of multi-domain fold com- 
binations in a number of genomes, indicating its great variability. 
Each row indicates a particular combination of scop fold pairs (using 
scop 1 .39), where a fold pair is defined as two distinct folds occurring 
in tandem in a protein. Each column represents a different genome, 
using the four-letter codes in the PartsList system (Qian et al. 2001): 
Aaeo, Aquifex aeolicus; Aful, Archaeoglobus fulgidus; Bbur, Borrelia 
burgdorferi; Bsub, Bacillus subtilis; Cele, Caenorhabditis elegans; Cpne, 
Chlamydia pneumoniae; Ctra, Chlamydia trachomatis; Ecol, Echerischia 
coli; Hinf, Haemophilus influenzae Rd; Hpyl, Helicobacter pylori; Mthe, 
Methanobacterium thermoautotrophicum; Mjan, Methanococcus jan- 
naschii; Mtub, Mycobacterium tuberculosis; Mgen, Mycoplasma geni- 
talium; Mpne, Mycoplasma pneumoniae, Phor, Pyrococcus horikoshii; 
Rpro, Rickettsia prowazekii; Seer, Saccharomyces cerevisiae; Syne, Syn- 
echocystis sp.; Tpal, Treponema pallidum. The numbers in each inter- 
section cell indicate the number of times the fold pairs occur in a 
genome. Only the 20 most common fold pair combinations are 
shown here; the remainder are shown on the Web site (http:// 
partslist.org/func). If a cell is greater than 6, it is shaded black; be- 
tween 3 and 6, gray; and below 3, white. The blank spaces show 
instances in which one of the pairs does not occur in the organism at 
all (indicated by a value of -1 in the data table on the Web site). The 
fold assignments are done in a fashion consistent with those in 
PartsList and associated systems (Gerstein 1997; Lin et al. 2000; Dra- 
wid et al. 2001; Harrison et al. 2001; Qian et al. 2001). 



tural classes compared to the single-domain matches. Alto- 
gether, there are more enzymes than non-enzymes among the 
multi-domain proteins (2805 enzymes vs. 1958 non-enzymes) 
whereas for single-domain proteins, the opposite is true (850 
enzymes vs. 968 non-enzymes). 

Table 1 summarizes the distribution of superfamilies and 
superfamily combinations among the major functional 
classes, i.e. whether they have only enzymatic, only non- 
enzymatic or both enzymatic and non-enzymatic functional- 
ity. Altogether, 215 superfamilies were found in single-domain 
proteins and 310 in multi-domain ones. As 70 superfamilies 
were found in both, altogether 455 distinct structural super- 
families matched a SWISS-PROT protein with our required 
coverage criteria (described above). Similarly, we apportioned 
the 281 superfamily combinations observed in multi-domain 
proteins amongst different broad functional categories. 

In single-domain proteins there are about as many su- 
perfamilies with exclusively enzymatic functionality as there 
are those with exclusively non-enzymatic functions (82 vs. 
78). In contrast, in multi-domain proteins this ratio increases 
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scop class 

Figure 3 Distribution of proteins amongst broad structural and 
functional classes; the distribution of the matches among the seven 
structural and two functional classes in single- and multi-domain pro- 
teins. The single-domain and multi-domain matches each total 
100%, independently of each other. The horizontal axis indicates the 
seven scop classes, which are (from 1 to 7): all-alpha, all-beta, alpha/ 
beta, alpha + beta, multi-domain, membrane, and small protein. 



to almost threefold (135 vs. 56). This agrees with the notion 
that most enzymes are multi-domain. Another difference be- 
tween single and multi-domain proteins appears in the ratio 
of superfamilies with a single function compared to multi- 
functional ones. As it is apparent from Table 1, about a quar- 
ter of the superfamilies matched single-domain proteins with 
different functions (55 of 215), whereas in the multi-domain 
proteins, this ratio increased to more than a third (119 of 310). 

Single-Domain Proteins 

Table 2 lists the two functionally most diverse structural su- 
perfamilies in single-domain proteins with some representa- 
tive functions. The most diverse superfamily, the 3.38.1 
Thioredoxin-like, has 11 different functions associated with 
it, most of them with an oxidoreductase mechanism. For in- 
stance, THIO_BPT4 is a small disulphide-containing thiore- 
doxin that serves as a general disulphide oxidoreductase, 



while TDX2J5RUMA is almost twice as long (199 aa) and 
serves as a thiol-specific antioxidant that acts against sulfur- 
containing radicals. Another interesting example of func- 
tional diversity is provided by the Scorpion toxin-like super- 
family (7.3.6). While BRAZ.PENBA is a small protein that is 
known to be 2000 times sweeter than sucrose, the other mem- 
bers of the superfamily are associated with different host- 
defense mechanisms. In insects the superfamily possesses 
antifungal activity (DMYC_DROME) or acts as a toxin 
(SCX5_BUTEU). Interestingly, in plants it can also act as an 
antifungal (AF2B_SINAL) or as an inhibitor of insect alpha- 
amylases (SIAl_SORBI). It appears that many single-domain 
proteins are toxins or allergens, or are related in other ways to 
a host-defense response. 

Based on the data we can also determine the probability 
of two single-domain proteins that match domains in the 
same superfamily category also carrying out the same func- 
tion. Using Bayes' theorem: 

P(F|S) = P(F)P(S|F)/((P(F)P(S|F) + P("F)P(S|-F)) (D 

where S is the probability that two proteins share the same 
superfamily, F is the probability that two proteins have the 
same function, and ~F is the probability that two proteins do 
not have the same function. Rearranging and simplifying the 
equation we get: 

P(F|S) = 1/(1 + N(S,"F)/(N(S,F)) (2) 

where N is the number of times that the two events in the 
parentheses occur together in our database of 1818 single- 
domain proteins. This results in 

P(F|S) = 1/(1 + 8501/12516) = 68%. 

That is, the probability that two single-domain proteins that 
have the same superfamily structure have the same function 
(whether enzymatic or not) is about 2/3. 

Multi-Domain Proteins 

Table 3 lists the combinations of superfamilies that have been 
associated with the greatest number of different functions in 
multi-domain proteins, with representative entries in SWISS- 
PROT. The combination with the greatest number of different 
functions is that of 1.95.1 and 7.33.1. Although it has twice as 
many different functions as the most diverse superfamily in 



Table 1. Functional Distribution of Single-domain, Multi-domain Superfamilies, and 
Multi-domain Combinations 

Single-domain Multi-domain Multi-domain sf am 

superfamilies superfamilies combinations 



Single Multiple Single Multiple Single Multiple 
function function function function function function 



Enzymatic 82 11 135 42 151 16 

Nonenzymatic 78 23 56 30 70 27 

Both functions — 15 — 47 — 17 

Total 160 55 191 119 221 60 



The basic functional distribution of the superfamilies in single- and multi-domain proteins and the 
functional distribution of multi-domain combinations are shown. The first row lists the number of 
scop superfamilies that were associated only with enzymatic function in each category. The second 

| row lists the number associated with only nonenzymatic functions, and the third row indicates the ! 

■ number of superfamilies that were associated with both types of function. Altogether, we charac- ; 

| terized 160 + 55 = 215 single-domain and 191 +119 = 310 multi-domain superfamilies, 70 of \ 

j which overlapped in the two categories. | 
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Table 2. Most Versatile Single-Domain Superfamilies 


No. 


No. 


Sfam 








func 


prot 


comb 


Function 


SWISS-PROT ID 


SWISS-PROT function 








E1.11.1 


GSHP RAT 


Plasma Glutathione Peroxidase (1 .11.1 .9) 








263# 


DYL5 CHLRE 


Dynein, Flagellar Outer Arm-C. reinhardtii 








D260# 


BSAA BACSU 


Glutathione Peroxidase Homolog Bsaa 








268# 


REHY TORRU 


Rehydrin-Tortu/a ruralis (Moss) 


11 


69 


3.38.1 


266# 


PHOS HUMAN 


Phosducin (33 Kd Phototransducing Protein) 








269# 


REHY ORYSA 


Rad24 Protein-Oryza sativa (Rice) 








272# 


THIO BPT4 


Thtoredoxin (Bacteriophage T4) 








D271#272# 


TDX2 BRUMA 


Thioredoxin Peroxidase 2 








261# 


BTUE ECOLI 


Vitamin B1 2 Transport Periplasmic Protein Btue 



Brazzein-Pentadiplandra brazzeana 
Neurotoxin Ts-Ka pa (Tsk)-(Brazilian scorpion) 
Cysteine-Rich Antifungal Protein 2b (Afp2b) 
Defensin, Isoforms B And C-Zophobasiatratus 
Drosomycin Precursor (Cysteine-Rich Peptide) 
Insectotoxin l5a-(Lesser Asian scorpion) 
Leiuropeptide lii-(Scorpion) 
Small-Pr Inhibitor Of Insect Alpha-Amylases 



10 



28 



7.3.6 



342# 

376#336# 
341#356# 
343# 
361 # 

361#376# 

336# 

203# 



BRAZ PENBA 
SCKK TITSE 
AF2B SINAL 
DEFA ZOPAT 
DMYC.DROME 
SCX5_BUTEU 
SCX3_LEIQH 
SIA1_SORBI 



31 0# AB18_PEA Aba-Responsive Protein Abr18-Garden Pea 

31 1 # DRR3 PEA Disease Resistance Response Protein Pi49 

231# MPAA_GORAV Major Pollen Allergen Cor A 1,-Eu. Hazel 

312# L18B_LUPLU Protein LI rl 8b (LI prl 0.1 b) 

E3.1 .- RNS2_PANGI Ribonuclease 2 (3.1 .-/-)-Panax Ginseng 

31 4# SAM2_SOYBN Stress-Induced Protein Sam22 



43 



1.26.1 



184# CSF2 SHEEP Colony-Stimulating Factor 

381#564#184# 1L4_RAT I nterleukin-4 (B-Cell Igg Diff. Factor) 

185# LIF_HUMAN Leukemia Inhibitory Factor (Lif) 

187# PRL_ANGAN Prolactin Precursor (Pri)- 

186# PLF3_MOUSE Proliferin 3 Mitogen- Regulated 

188# SOMA_PAROL Somatotropin (Growth Hormone) 



The most versatile superfamilies in single-domain proteins as determined from their functional description in SWISS- 
PROT, with some representatives. The keyword combinations in the fourth column were based either on the first three 
components of their EC numbers (for enzymes) or derived automatically by comparing the DE description line of 
SWISS-PROT entries to a list of synonymous keywords at http://bioinfo.mbb.yale.edu/partslist/func. A keyword num- 
ber starting with a D indicates an enzyme that does not have an assigned EC number in its description in SWISS-PROT. 



the single-domain proteins (22 vs. 11, respectively), careful 
examination reveals that all the proteins in this category are 
DNA-binding and most of them act as hormone receptors. 

The second entry listed in the table is the combination of 
the 3.4.1 and 4.48.1 superfamilies associated with the FAD/ 
NAD(P)-linked reductases. It is an all-enzymatic combination 
and always carries out an oxido-reductase function. All the 
proteins in this category are completely covered by matches 
with these two superfamilies. The 1.78.1-2,1.1 hemocyanin- 
immunoglobulin combination seems also to be fairly con- 
served; although the proteins in this category are called by 
eight different names, most of them turn out to be extracel- 
lular larval storage proteins, except for the copper-containing 
oxygen carrier hemocyanin itself (HCY_PALVU). 

Following the same logic, we can also determine the 
probability that two proteins that have the same superfamily 
combination share the same function, viz: 

P(F|S) = 1/(1 + 32242/134230) = 81% 

This means that we have significantly greater certainty in de- 
termining the function of a multi-domain protein with a par- 
ticular superfamily combination than that of a single-domain 
protein containing a particular superfamily. We also deter- 
mined a similar probability for those proteins that have an 



almost complete coverage with exactly the same type and 
number of superfamilies, following each other in the same 
order. The probability that the functions are the same in this 
case was 91%, a considerably higher value than above. How- 
ever, if two multi-domain proteins share only a single super- 
family, the probability that they share the same function 
drops to only 35%! This greater functional certainty from 
sharing a combination of superfamilies rather than just one is 
also reflected in Table 1. While one-fourth of the single- 
domain proteins and one-third of singularly matching super- 
families in multi-domain proteins have multiple functions, 
only about one-fifth of the multi-domain combinations pos- 
sess multiple functions (60 of 281). It is also clear from the 
data that domains in larger proteins often lose their original 
function and no longer have an autonomous function. 

Seventy Common Superfamilies and Their 
Functions Compared in Single-Domain 
and Multi-Domain Proteins 

As mentioned above, of the 455 superfamilies in our analysis, 
only 70 occur in both single- and multi-domain proteins. 
Even more surprising is the small number of structural super- 
families (14) that have the same function in both single- and 
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Table 3. Most Versatile Superfamily Combinations in Multi-Domain Proteins 



No. 


No. 


Sfam 








func 


prot 


comb. 


runciiuii 


^WKS-PROT in 
jvvijj*rnvi i \J 


SWISS-PROT function 








29# 


THB RANCA 


Thyroid Hormone Receptor Beta 








10# 


HNF4 DROME 


Transcription Factor HNF-4 Homolog 








31#32# 


EAR2 MOUSE 


V-Erba Related Protein Ear-2 








29#30# 


ECR_MANSE 


Ecdysone Receptor (Ecdysteroid Receptor) 


22 


176 


1.95.1/7.33.1 




FOR A AV1FR 


FrlSa nnrAnono Pmfotn 
ClUa WIlLOycllC rlULClM 


556#564#35# 


NGFfXENLA 


Nerve Growth Factor Induced Protein l-B 








576# 


NR42 HUMAN 


Immediate-Early Response Protein Not 








36# 


PPAT HUMAN 


Peroxisome Proliferator Activated Receptor 








1 37# 


RXTCLCHICK 


Retinoic Acid Receptor RXR-Gamma 








38# 


f LLDROVI 


Tailless Protein 








El .8.2 


DHSU.CHRVI 


Sulfide Dehydrogenase (1.8.2.-^) 








El .8.1 


DLDH ZYMMO 


Dihydrolipdamide Dehydrogenase (1 .8.1 .4) 


8 


54 


3.4.1/4.48.1 


El.6.4 


TYTR TRYGR 


Trypanothione Reductase (1.6.4.8) (Tr) 








El .16.1 


MERA STRLI 


Mercuric Reductase (1.16.1.1) 








E1.6.99 


NAOX^MYCPN 


Probable NADH Oxidase (1 .6.99.3) (Noxase) 








19# 


ARYB MANSE 


Arylphorin Beta Suburiit-(Tobacco Hornworm) 








20# 


CRPI PERAM 


Allergen Gr-Pi Precurs6r-<American Cockroach) 








21#427# 


HCY PALVU 


Hemocyanin-(European Spiny Lobster) 


8 


23 


1.78.1/2.1,1 


22# 


HEXA BLADI 


Hexamerin Precursor-(Tropicat Cockroach) 








23# 


JSP1 TRINI 


Acidic juvenile Hormonne-Suppressible Protein 








24# 


LSP2 DROME 


Larval Serum Protein 2 Precursor (LSP-2) 








546#25# 


SSP1 BOMMO 


Sex-Specific Storage-Protein 1 



Note that the combination With and 7.33.1. Careful 

examination reveals that all the proteins with this combination are DNA-binding and most of them act as various j 

hormone receptors. In particular, HNF4_DROME and NR42_HUMAN also have transcription activator functions. Note I 

| that these two proteins are considerably longer than the others in this group and are not covered completely by j 

| structural matches: A large C-terminal and a large N-terminal portion are left uncovered, respectively. i 



multi-domain proteins. These are listed in Table 4; 12 of them 
have enzymatic function, supporting the notion that en- 
zymes are more conserved during evolution than non- 
enzymes. The two non-enzymatic superfamilies are the 4.29.1 
ribosomal superfamily and the 5.4.1 superfamily in penicillin- 
binding proteins. 

Table 5 presents several examples of the converse situa- 
tion, shared superfamilies that have different functions in 
single and multi-domain proteins. Comparing parts A and B 
of the table highlights the fact that although both superfami- 



lies in a multi-domain protein are often present in single- 
domain form as well, the functions in the different settings 
are only vaguely related. One example is the combination of 
the lipocalin superfamily (2.45.1) with that of the BPTI-like or 
Kunitz inhibitor (7.7.1), which in higher organisms forms a 
complex protein called alpha- 1 -microglobulin (AMBP_RAT). 
Another interesting example is the combination of the 2.5.1 
Cupredoxin (occurring in the single-domain blue-copper pro- 
tein, SOXE_SULAC) and the 6.5.1 Membrane all-alpha 
(single-domain representative: BACT_HALVA, a sensory rho- 



| Table 4. Superfamilies With the Same Function in Single- and Multi-Domain Proteins as Determined from Their Keyword 
Combination or First Three Components of Their EC Numbers 



Single-domain proteins 



Multi-domain proteins 



SWISS-PROT 



Sfam 


Function 


ID 


1.81.1 


E3.2.1 


GUNY ERWCH 


2.66.2 


E3.5.1 


URE2 YERPS 


3.17.2 


E6.3.5 


NADE MYCPN 


3.37.1 


E3.1.3 


PTP2 NPVOP 


3.67.1 


E4.2.1 


TRPB VIBPA 


4.19.1 


E5.2.1 


FKB1 MET] A 


4.2.1 


E3.2.1 


LYCV BPP2 


4.29.1 


85# 


RS5 ACYKS 


4.52.1 


E3.4.24 


SNPA STRCS 


4.6.1 


E3.5.1 


URE3 YERPS 


5.10.1 


E2.7.7 


KANU STAAU 


5.4.1 


161# 


AMPH ECOLI 



SWISS-PROT function 



SWISS-PROT 
IP 



SWISS-PROT function 



Endoglucanase (3.2.1 .4) AMYG_NEUCR 

Urease Beta (3.5.1.5) UREVHELPY 

NAD(-f) Synthetase (6.3.5.1) CUAA_YEAST 

Protein-Tyrosine Phosphatase 2 (3.1 .3.48) PTNB_RAT 

Tryptophan Synthase (4.2.1 .20) TRP_YEAST 

Peptidylprolyl Gs-Trans Isomerase (5.2.1 .8) FKB7_WHEAT 

Lysozyme (3.2.1.17) CHIX_PEA 

30s Ribosomal Protein S5 RS5.TREPA 

Extracellular Neutral Protease (3.4.24.-) BMPH_STRPU 

Urease Gamma (3.5.1 .5) URE1 JHELPY 

Kanamycin Nucleotidyltransferase (2.7.7.-) DPOB_XENLA 

Penicillin-binding Protein Amph PBPX_STRPN 



Glucoamylase Precursor (3.2.1.3) 
Urease Alpha Subunit (3.5.1.5) 
GMP Synthase (6.3.5.2) 
Protein-Tyrosine Phosphatase (3.1.3.48) 
Tryptophan Synthase (4.2.1.20) 
70 Kd Peptidylprolyl Isomerase (5.2.1 .8) 
Endochitinase Precursor (3.2.1 .14) 
30s Ribosomal Protein S5 
Collagenase 3 Precursor (3.4.24.-) 
Urease Alpha Subunit (3.5.1.5) 
Dna Polymerase Beta (2.7.7.7) 
Penicillin-binding Protein 3x Pbp2x 
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Table 5. Examples of Superfamilies Present in Both Single- and Multi-Domain Proteins, 
Carrying out Different Functions 



Table 5A. 


Single-Domain Proteins 




Sfam 


Fund # 


SWISS-PROTID 


SWISS-PROT function 


I .ZD. J 


352# 
183# 
El. 17.4 
192# 


FTN2 HAEiN 
NICY DESVH 
RIR4 YEAST 
NLP_HAEIN 


Ferritin-like Protein 2 
Nigerythrin 

(Ribonucleotide Reductase) (1 .1 7.4.1 ) 
Ner-like Protein Homolog 


1.4.3 


196# 


H1A_PLADU 


Histone HI A, Sperm 


1.81.2 


E2.5.1 


PFTB_PEA 


Farnesyltransferase Beta Su (2.5.1. -) 




226# 
22 7# 

228#412# 
229# 
E5.3.99 
230#421# 


ERBP_RAT 
FAB 3 CAEEL 
NCAL MOUSE 
NP4 RHOPR 
PGHD HUMAN 
VNS1„MOUSE 


Epididymal-Tetinoic Acid Binding Protein 
Fatty Acid-Binding Protein Homolog 3 
Neutrophil Gelatinase-Assoc. Lipocalin 
Nitrophorin 4 Precursor 
Prostaglandin-H2 D-lsomerase (5.3.99.2) 
Vesomeral Secretory Protein I 


2.5.1 


231# 

232#427# 


MPA3 AMBEL 
SOXEJULAC 


Pollen Allergen AMB A 3 (AMB A lii) 
Sulfocyanin (Blue Copper Protein) 


3.14.2 


373# 


RRF1_DESVH 


Rrf1 Protein 


3.29.1 


E6.3.4 
E2.7.4 
D259# 
E2.7.1 


PURA CAEEL 
KTHY YEAST 
VA57 VACCV 
KITH_VZVW 


Adenylosuccinate Synthetase (6.3.4.4) 
Thymidylate Kinase (2.7.4.9) 
Cuanylate Kinase Homolog 
Thymidine Kinase (2.7.1.21) 


3.47.1 


275# 
276# 


MBL BACSU 
MREBJ5ACSU 


MBL Protein 

Rod Shape-determining Protein Mreb 


3.48.1 


E3.1 .3 


PPA5.YEAST 


Repressibie Acid Phosphatase (3.1 .3.2) 


3.81.1 


D281# 
282# 


AMIC PSEAE 
LUXP.VIBHA 


Aliphatic Amidase Expression-Regulator 
LUX P Protein Precursor 


4.1 03.1 


E2/4/2 


TOX1_BORPE 


Pertussis Toxin Su 1 (2.4.2.-) 


4.105.1 


291# 


LECC_POLMI 


Lectin-Polyandrocarpa Misakiensis 


4.11.5 


295# 


TERP_PSESP 


Terpredoxin 


4.19.1 


E5.2.1 


FKBVMETjA 


Pept-Prolyl Cis-Trans Isomerase (5.2.1 .8) 


6.5.1 


E3.6.1 
540#325# 


ATPL VIBAL 
BACT.HALVA 


ATP Synthase (3.6.1.34) (Lipid-binding) 
Sensory Rhodopsin II (Sr-li) 


7.35.4 


E1.9.3 
345# 


COXB RAT 
DESR„DESBI 


Cytochrome C Oxidase (1.9.3.1) (Via*) 
Desulforedoxin (Dx) 


7.7.1 


349# 


TAPJDRNMO 


Tick Anticoagulant Peptide 


{Table continues on following page.) 



d opsin) superfamilies into a component of the respiratory 
chain, cytochrome C oxidase II (COOX.ZOOAN). All these 
examples demonstrate the evolutionary advantage of a do- 
main fusion event, which creates a function that is more com- 
plex than either of the components. 

Multifunctionality vs. Sequence Similarity 

Previously, we presented a variety of graphs that show how 
the probability that two domains would share the same func- 
tion varied with respect to sequence similarity (Hegyi and 



Gerstein 1999; Wilson et al. 2000). Figure 4 shows a similar 
graph with the calculations extended to multi-domain pro- 
teins. The figure shows that the functional divergence of a 
single domain in multi-domain proteins dramatically in- 
creases, more than twofold, compared to the single-domain 
ones. This reinforces our findings above, based only on super- 
family content, that the certainty with which we can predict 
the function of a protein based on its sequence similarity with 
a domain in another multi-domain protein, is considerably 
less than for a comparable single-domain situation. 
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Table 5B. Multi-Domain Proteins 



Sfam Comb. 


Funct# 


SWISS-PROT ID 


SWISS-PROT function 


1.25.1/7.35.4 


104# 


RUBY_MET)A 


Putative Rubrerythrin 


1.32.1/3.81.1 


11# 
12# 

581 #1 1 # 
582#11# 


PURR HAE1N 
DEGAJ3ACSU 
SCRR STRMU 
REGA_CLOAB 


Purine Nucleotide Synthesis Repressor 
Degradation Activator 
Sucrose Operon Repressor 
Transcription Regulatory Protein Rega 


1.4.3/3.14.2 


10# 

11# 

13# 

190# 

366# 


SKN7 YEAST 
V1RG AGRT5 
RGX3 MYCTU 
PFER PSEAE 
PETR_RHOCA 


Transcription Factor Skn7 (Pos9 Protein) 
Virg Regulatory Protein 
Sensory Transduction Protein REGX3 
Transcriptional Activator Protein Pfer 
Petr Protein 


2.45.1/7.7.1 


203#153# 


HC_RAT 


Alpha-1 -Microglobulin/Trypsin Inhibitor 


2.5.1/6.5.1 


El .9.3 


COX2_ZOOAN 


Cytochrome C Oxidase li (1.9.3.1) 


3.29.1/3.48.1 


E2.7.1 


F26_RANCA 


6-Ph6sphofructo-2-Kinase (2.7.1 .105) 


3.47.1/5.17.1 


1# 

1#83# 


YEDO YEAST 
GR73_MAI2E 


Heat Shock Protein 70 Homolog YEL030w 
Ig-Binding Protein 



DISCUSSION 

Here we built on our previous studies on the relationship 
between protein structure and function to develop new re- 
sults related to multi -domain proteins. Throughout the paper, 
we focused on superfamilies instead of folds, as the members 
of a superfamily are presumably of common evolutionary ori- 
gin (Murzin et al. 1995). 

We found that the 4763 multi-domain and 1818 single- 
domain proteins that met our selection criteria have about 
the same distribution of structural classes, with more enzy- 
matic functions associated with the alpha/beta structural 
classes and more non-enzymatic ones with the all-alpha and 
small classes. We identified more than three times as many 
multi-domain proteins that were enzymes than single- 
domain ones (2805 and 850, respectively) and, conversely, 
about twice as many multi-domain proteins as single-domain 
ones that were non-enzymes (1958 vs. 968). 

We focused on the functional divergence of the two 
groups and found that about a quarter of the superfamilies in 
single-domain proteins are associated with multiple func- 
tions, whereas only about a fifth of the multi-domain super- 
family combinations are. Therefore, we can conclude that a 
combination of specific superfamilies results in a more spe- 
cific functional assignment for a particular protein. However, 
about one-third of the superfamilies in the multi-domain pro- 
teins were associated with multiple functions, underlining 
the lesser autonomy of a domain function in multi-domain 
protein. 

This latter finding was also supported by the difference 
in functional divergences between the two groups of proteins 
based on particular sequence similarities between the do- 
mains and SWISS-PROT proteins. As is shown in Figure 4, the 
average functional divergence of a single domain is much 
larger (more than twofold) in multi-domain proteins than in 
single-domain ones. 

We also found that only 70 of a total of 455 superfamilies 
are shared between the multi-domain and single-domain pro- 
teins and only a small fraction (14) share their functions. This 



was rather surprising to us, and should be taken into consid- 
eration in functional characterization and annotation of new 
gene products. When the functions were related in single- and 
multi-domain proteins, we could observe an increasing func- 
tional complexity with the appearance of large multi-domain 
proteins. 

Altogether, with the recent sequencing of the human 
genome and the genomes of other model organisms, we hope 




0 20 40 60 

-log(e-value) 



Figure 4 Divergence in function with respect to sequence similar- 
ity. Relative number of matching domains with multiple functions, as 
the function of e-value threshold. Diamonds represent single-domain 
proteins, squares multi-domain ones (matching just for a single do- 
main), respectively. The first value on the X-axis starts at 4 (corre- 
sponding to an e-value=10" 4 ). 
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that this work can contribute to the successful annotation of 
the individual gene products, and will help to avoid some 
pitfalls associated with the functional characterization of 
large, complex proteins. 

The publication costs of this article were defrayed in part 
by payment of page charges. This article must therefore be 
hereby marked "advertisement" in accordance with 18 USC 
section 1734 solely to indicate this fact. 
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Figure 2-42 

Structure of a fi-tum. The CO group 
of residue 1 of the tetrapeptide shown 
here is hydrogen bonded to the NH 
group of residue 4, which results in 
a hairpin turn. 



POLYPEPTIDE CHAINS CAN REVERSE DIRECTION 
BY MAKING 0-TURNS 

Most proteins have compact, globular shapes due to frequent rever- 
a s of the direction of their polypeptide chain, Analyses of the 
three-dimensional structures of numerous proteins have revealed 
that many of these chain reversals are accomplished by a common 
structural element called the 0-turn. The essence of this LirpinTn 

h a , u °* T £° UP ° f reSidue " ° f a Polypeptide is hydrogen 
bonded to the NH group of residue (n + 3) (Figure 2-42). ThTa 
polypeptide chain can abruptly reverse its direction. 

LEVELS OF STRUCTURE IN PROTEIN ARCHITECTURE 

In discussing the architecture of proteins, it is convenient to refer to 
four levels of structure. Primary structure is simply the sequence of 
ammo acids and location of disulfide bridges, if there ^arlTny The 
pnmary structure is thus a complete description of the covalent 
connections of a protein. Secondary structure refers to the steric rela 
.onship of amino acid residues that are close to one ano her in the 
hnea, sequence. Some of ^ ^ ^ her ^ 

sheet' and fheln * "T^ ^ " h **> the * P^ed 

sheet, and the collagen helix are examples of secondary structure 

Ternay structure refers to the steric relationship of amino acid rest 

thaTthe^^ " ^ Hnear SCqUenCe * Sh ° uld 

that the dividing line between secondary and tertiary structure is 

arbi rary. Proteins that contain more than one polypeptide STain 

fz; y ; n r dditi ri lev ? of structurai o^^^^- 

Sd 2 2 W K t " ^ t0 ^ in Which the ch -- are 
Tstun^t^ Each P ol yP e P tide cl -in - such a protein is called 
a subunit. Another useful term is domain, which refers to a compact 
globular unit of protein structure. Many proteins fold into domaTns 
having masses that range from 10 to 20 kdal. The domains rfW 

^zs uany connected by reiativ ^ fl - ibie «*z 2££ 



H— C— O-OH 
Performic acid 



AMINO ACID SEQUENCE SPECIFIES 
THREE-DIMENSIONAL STRUCTURE 

PrStl LnH T rel r ti0n betWeen the amin ° acid of a 

Anfinl k COn [° rmatlon ca ™ from the work of Christian 
Anfinsen on ribonuclease, an enzyme that hydrolyzes RNA Ribo 

residues (Figure 2-43). It contains four disulfide bonds, which can 
be irreversibly oxidized by perforrmc acid to give cysteic add residues 



WORLD INTELLECTUAL PROPERTY ORGANIZATION 
International Bureau 




Docket No.: PF-0300-3 CON 
USSN: 09/745,506 
Ref.No. 1 of 19 



PCT 

©n^^ATipNAL , APPLICATION PUBLISHED UNDER T0E PATENT COOPERATION TREATY (PCT) 



(51) International Patent Classification 6 : 
C12Q 1/68 



Al 



(11) International Publication-Number: WO 95/21944 

(43) International Publication Date: 17 August 1995 (17.08.95) 



(21) International Application Number: PCT/US95/01863 

(22) International Filing Date: 14 February 1995 (14.02.95) 



(30) Priority Data: 

08/195,485 



14 February 1994 (14.02.94) US 



(60) Parent Application or Grant 

(63) Related by Continuation 

US 08/195,485 (CIP) 

Filed on 14 February 1994 (14.02.94) 



(71) Applicant (for all designated States except US): SMTTHKLINE 

BEECHAM CORPORATION [US/US]; Corporate Intellec- 
tual Property, UW2220, 709 Swedeland Road, P.O. Box 
1539, King of Prussia, PA 19406-0939 (US). 

(72) Inventors; and 

(75) Inventors/Applicants (for US only): ROSENBERG, Martin 
[US/US]; 241 Mingo Road, Royersford, PA 19468 (US). 
DEBOUCK, Christine [BE/US]; 667 Pugh Road, Wayne, 
PA 19087 (US). BERGSMA, Derk [US/US]; 271 Irish Road, 
Berwyn, PA 19312 (US). 



(74) Agents: JERVIS, Herbert, H. et al.; SmithKline Beecham 
Corporation, Corporate Intellectual Property, UW2220, 709 
Swedeland Road, P.O. Box 1539, King of Prussia, PA 
19406-0939 (US). 



(81) Designated States: JP, US, European patent (AT, BE, CH, DE, 
DK, ES, FR, GB, GR, IE, IT, LU, MC, NL, PT, SE). 



Published 

With international search report. 



(54) Title: DIFFERENTIALLY EXPRESSED GENES IN HEALTHY AND DISEASED SUBJECTS 



(57) Abstract 



The present invention involves methods and compositions for identifying genes which are differentially expressed in a normal healthy 
animal and an animal having a selected disease or infection, and methods for diagnosing diseases or infections characterized by the presence 
of those genes, despite the absence of knowledge about the gene or its function. The methods involve the use of a composition suitable 
for use in hybridization which consists of a solid surface on which is immobilized at pre-defined regions thereon a plurality of defined 
oligonucleotide/polynucleotide sequences for hybridization. Each sequence comprises a fragment of an EST isolated from an identified 
DNA library prepared from tissue or cell samples of a healthy animal, an animal with a selected disease or infection, and any combination 
thereof. Differences in hybridization patterns produced through use of this composition and the specified methods enable diagnosis of 
disease based on differential expression of genes of unknown function, and enable the identification of those genes and the proteins encoded 
thereby. 
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differentially expressed genes in healthy and diseased subjects 

Cross Reference to Related Applications: 
5 This application is a continuation-in-part application of U.S. Serial No. 

08/195,485 filed February 14, 1994, the contents of which are incorporated herein by 
reference. 

Field of the Invention 

10 The present invention relates to the use of immobilized 

oligonucleotide/polynucleotide or polynucleotide sequences for the identification, 
sequencing and characterization of genes which are implicated in disease, infection, 
or development and the use of such identified genes and the proteins encoded thereby 
in diagnosis, prognosis, therapy and drug discovery. 

15 

Background pf the Invention 

Identification, sequencing and characterization of genes, especially 
human genes, is a major goal of modern scientific research. By identifying genes, 
determining their sequences and characterizing their biological function, it is possible 

20 to employ recobinant DNA technology to produce large quantities of valuable "gene 
products", e.g M proteins and peptides. Additionally, knowledge of gene sequences 
can provide a key to diagnosis, prognosis and treatment of a variety of disease states 
in plants and animals which are characterized by inappropriate expression and/or 
repression of selected gene(s) or by the influence of external factors, e.g., carcinogens 

25 or teratogens, on gene function. The term disease-associated genes(s) is used herein 
in its broadest sence to mean not only genes associated with classical inherited 
diseases, but also those associated with genetic predisposition to disease as well as 
infectious or pathogenic states resulting from gene expression by infectious agents or 
the effect on host cell gene expression by the presence of such a pathogen or its 

30 products Locating disease-associated genes will permit the development of 
diagnostic and prognostic reagents and methods, as well as possible therapeutic 
regimens, and the discovery of new drugs for treating or preventing the occurrence of 
such diseases. 

Methods have been described for the identification of certain novel 
35 gene sequences, referred to as Expressed Sequence Tags (EST) [see, e.g., Adams et 
al, Science. 252:1651-1656 (1991); and International Patent Application No. 
WO93/00353, published January 7, 1993]. Conventially, an EST is a specific cDNA 
polynucleotide sequence, or tag, about 150 to 400 nucleotides in length, derived from 
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a messenger KNTA morecule by reverse transcription, which is a marker for, and 
component of, a human gene actually transcribed in vivo. However, as used herein an 
EST also refers to a genomic DNA fragment derived from an organism, such as a 
microorganism,the DNA of which lacks intron regions. 
5 A variety of techniques have been described for identifying particular 

gene sequences on the basis of their gene products. For example, several techniques 
are described in the art [see, e.g., International Patent Application No. W09 1/07087, 
published May 30, 1991]. Additionally, known methods exist for the amplification of 
desired sequences [see, e.g., International Patent Application No. W091/17271, 

10 published November 14, 1991 , among others]. 

However, at present, there exist no established methods for filling the 
need in the art for methods and reagents which employ fragments of differentially 
expressed genes of known, unknown (or previously unrecognized ) function or 
consequence to provide diagnostic and therapeutic methods and reagents for diagnosis 

15 and treatment of disease or infection, which conditions are characterized by such 
genes and gene products. It should be appreciated that it is the expression differences 
that are diagnostic of the altered state (e.g., predisease, disease, pathogenic, 
progression or infectious). Such genes associated with the altered state are likely to 
be the targets of drug discovery, whether the genes are the cause or the effect of the 

20 condition, identification of such genes provides insight into which gene expression 
needs to be re-altered in order to reestablished the healthy state. 

Summary of the Invention 

In one aspect, the invention provides methods for identifying gene(s) 

25 which are differentially expressed, for example, in a normal healthy organism and an 
organism having a disease. The method involves producing and comparing 
hybridization patterns formed between samples of expressed mRNA or cDNA 
polynucleotide sequences obtained from either analogous cells, tissues or organs of a 
healthy organism and a diseased organism and a defined set of 

30 oligonucleotide/polynucleotide/polynucleotide sequence probes from either an 
healthy organism or a diseased organism immobilized on a support. Those defined 
oligonucleotide/polynucleotide sequences are representative of the total expressed 
genetic component of the cells, tissues, organs or organism as defined the collection 
of partial cDNA sequences (ESTs). The differences between the hybridization 

35 patterns permit identification of those particular EST or gene-specific 
oligonucleotide/polynucleotide sequences associated with differential expression, and 
the identification of the EST permits identification of the clone from which it was 
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- - derived and using ordinary skill further cloning and, if desired, sequencing of the full- 
length cDNA and genomic counterpart, i.e., gene, from which it was obtained. 

In another aspect, the invention provides methods substantially similar 
to those described above, but which permit identification of those gene(s) of a 
5 pathogen which are expressed in any biological sample of an infected organism based 
on comparative hybridization of RNA/cDNA samples derived from a healthy versus 
infected organism, hybridized to an oligonucleotide/polynucleotide set representative 
of the gene coding complement of the pathogen of interest 

In another aspect, the invention provides methods substantially similar 

10 to those described above, but which permit identification of those ESTs-specific 
oligonucleotide/polynucleotide sequences of host gene(s) which represent genes being 
differentially expressed/ altered in expression by the disease state, or infection and are 
expressed in any biological sample of an infected organism based on comparative 
hybridization of RNA/cDNA samples derived from a healthy versus infected 

15 organism of interest. 

In a further aspect, the methods described above and in detail below, 
also provide methods for diagnosis of diseases or infections characterized by 
differentially expressed genes, the expression of which has been altered as a result of 
infection by the pathogen or disease causing agent in question. All identified 

20 differences provide the basis for diagnostic testing be it the altered expression of 
endogenous genes or the patterned expression of the genes of the infecting organism. 
Such patterns of altered expression are defined by comparing RNA/cDNA from the 
two states hybridized against a panel of oligonucleotide/polynucleotides representing 
the expressed gene component of a cell, tissue, organ or organism as defined by its 

25 collection of ESTs. 

Yet a further aspect of this invention provides a composition suitable 
for use in hybridization, which comprises a solid surface on which is immobilized at 
pre-defined regions thereon a plurality of defined oligonucleotide/polynucleotide 
sequences for hybridization, each sequence comprising a fragment of an EST isolated 
30 from a cDNA or DNA library prepared from at least one selected tissue or cell 
sample of a healthy (i.e., pre-disease state) animal, at least one analogous sample of 
an animal having a disease, at least one analogous sample of an animal infected with a 
pathogen or the pathogen itself, or any combination or multiple combinations thereof. 

An additional aspect of the invention provides an isolated gene 
35 sequence which is differentially expressed in a normal healthy animal and an animal 
having a disease, and is identified by the methods above. Similarly, an isolated 
pathogen gene sequence which is expressed in tissue or cell samples of an infected 
animal can be identified by the methods above. 

3 
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' ~ "Yet another aspect of the invention is that it provides not only a means 
for a static diagnostic but also provides a means for a carrying out the procedure over 
time to measure disease progression as well as monitoring the efficacy of disease 
treatment regimes including an toxicological effects thereof. 
5 Another aspect of the invention is an isolated protein produced by 

expression of the gene sequences identified above. Such proteins are useful in 
therapeutic compositions or diagnostic compositions, or as targets for drug 
development 

Other aspects and advantages of the present invention are described 
10 further in the following detailed description of the preferred embodiments thereof. 

Detailed Description of the Invention 

The present invention meets the unfulfilled needs in the art by 
providing methods for the identification and use of gene fragments and genes, even 

15 those of unknown full length sequence and unknown function, which are 
differentially expressed in a healthy animal and in an animal having a specific disease 
or infection by use of ESTs derived from DNA libraries of healthy and/or 
diseased/infected animals. Employing the methods of this invention permits the 
resulting identification and isolation of such genes by using their corresponding ESTs 

20 and thereby also permits the production of protein products encoded by such genes. 
The genes themselves and/or protein products, if desired, may be employed in the 
diagnosis or therapy of the disease or infection with which the genes are associated 
and in the development of new drugs therefor. 

It has been appreciated that one or more differentially identified EST 

25 or gene-specific oligonucleotide/polynucleotides define a pattern of differentially 
expressed genes diagnostic of a predisease, disease or infective state. A knowledge of 
the specific biological function of the EST is not required only that the ESTs 
identifies a gene or genes whose altered expression is associated reproducibly with 
the predisease, disease or infectious state. The differences permit the identification of 

30 gene products altered in their expression by the disease and represent those products 
most likely to be targets of therapeutic intervention. Similarly, the product may be of 
the infecting organism itself and also be an effective target of intervention. 

/. Definitions. 

35 Several words and phrases used throughout this specification are 

defined as follows: 

As used herein, the term "gene" refers to the genomic nucleotide 
sequence from which a cDNA sequence is derived, which cDNA produces an EST, as 

4 
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* described below " The ferm gene classically refers to the genomic sequence, which, 
upon processing, can produce different cDNAs, e.g., by splicing events. However, 
for ease of reading, any full-length counterpart cDNA sequence which gives rise to an 
EST will also be referred to by shorthand herein as a 'gene*. 
5 The term "organism" includes without limitation, microbes, plants and 

animals. 

The term "animal" is used in its broadest sense to include all members 
of the animal kingdom, including humans. It should be understood, however, that 
according to this invention the same species of animal which provides the biological 
10 sample also is the source of the defined immobilized oligonucleotide^x>lynucleotides 
as defined below. 

The term "pathogen" is defined herein as any molecule or organism 
which is capable of infecting an animal or plant and replicating its nucleic acid 
sequences in the cells or tissues of that animal or plant . Such a pathogen is generally 

15 associated with a disease condition in the infected animal or plant. Such pathogens 
may include viruses, which replicate intra- or extra-cellularly, or other organisms, 
such as bacteria, fungi or parasites, which generally infect tissues or the blood. 
Certain pathogens or microorganisms are known to exist in sequential and 
distinguishable stages of development, e.g., latent stages, infective stages, and stages 

20 which cause symptomatic diseases. In these different stages, the pathogens are 
anticipated to express differentially certain genes and/or turn on or off host cell gene 
expression. 

As used herein, the term "disease" or "disease state" refers to any 
condition which deviates from a normal or standardized healthy state in an organism 

25 of the same species in terms of differential expression of the organism's genes. In 
other words, a disease state can be any illness or disorder be it of genetic or 
environmental origin , for example, an inherited disorder such as certain breast 
cancers, or a disorder which is characterized by expression of gene(s) normally in an 
inactive, 'turned off state in a healthy animal, or a disorder which is characterized by 

30 under-expression or no expression of gene(s) which is normally activated or 'turned 
on 1 in a normal healthy animal. Such differential expression of genes may also be 
detected in a condition caused by infection, inflammation, or allergy, a condition 
caused by development or aging of the animal, a condition caused by administration 
of a drug or exposure of the animal to another agent, e.g., nutrition, which affects 

35 gene expression. Essentially, the methods described herein can be adapted to detect 
differential gene expression resulting from any cause, by manipulation of the defined 
oligonucleotide/polynucleotides and the samples tested as described below. The 
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— - concept of disease or disease state also includes its temporal aspects in terms of 
progression and treatment. 

The phrase "differentially expressed" refers to those situations in 
which a gene transcript is found in differing numbers of copies, or in activated vs 
5 inactivated states, in different cell types or tissue types of an organism, having a 
selected disease as contrasted to the levels of the gene transcript found in the same 
cells or tissues of a healthy organism. Genes may be differentially expressed in 
differing states of activation in microorganisms or pathogens in different stages of 
development For example, multiple copies of gene transcripts may be found in an 

10 organism having a selected disease, while only one, or significantly fewer copies, of 
the same gene transcript are found in a healthy organism, or vice-versa. 

As used herein, the term "solid support" refers to any known substrate 
which is useful for the immobilization of large numbers of 
oligonucleotide/polynucleotide sequences by any available method to enable 

15 detectable hybridization of the immobilized oligonucleotide/polynucleotide sequences 
with other polynucleotide sequences in a sample. Among a number of available solid 
supports, one desirable example is the supports described in International Patent 
Application No. WO91/07087, published May 30, 1991.Also useful are suports such 
as but not limited to nitrocellulose, mylein, glass, silica ans Pall Biodyne C® It is 

20 also anticipated that improvements yet to be made to conventional solid supports may 
also be employed in this invention. 

The term "surface" means any generally two-dimensional structure on 
a solid support to which the desired oligonucleotide/polynucleotide sequence is 
attached or immobilized. A surface may have steps, ridges, kinks, terraces and the 

25 like. 

As used herein, the term "predefined region" refers to a localized area 
on a surface of a solid support on which is immobilized one or multiple copies of a 
particular oligonucleotide/polynucleotide sequence and which enables the 
identification of the oligonucleotide/polynucleotide at the position, if hybridization of 
30 that oligonucleotide/polynucleotide to a sample polynucleotide occurs. 

By "immobilized" refers to the attachment of the 
oligonucleotide/polynucleotide to the solid support. Means of immobilization are 
known and conventional to those of skill in the art, and may depend on the type of 
support being used. 

35 By "EST" or "Expressed Sequence Tag" is meant a partial DNA or 

cDNA sequence of about 150 to 500, more preferably about 300, sequential 
nucleotides of a longer sequence obtained from a genomic or cDNA library prepared 
from a selected cell, cell type, tissue or tissue type, organ or organism which longer 

6 
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" sequence corresponds to an mRNA of a gene* found in that library. An EST is 
generally DNA. One or more libraries made from a single tissue type typically 
provide at least about 3000 different (i.e., unique) ESTs and potentially the full 
complement of all possible ESTs representing all cDNAs e.g., 50,000-100,000 in an 
5 animal such as a human. Further background and information on the construction of 
ESTs is described in M. D. Adams et al, Science, 252:1651-1656 (1991); and 
International Application Number PCT/US92/05222 (January 7, 1993). 

As used herein, the term "defined oligonucleotide/polynucleotide 
sequence" refers to a known nucleotide sequence fragment of a selected EST or gene. 

10 This term is used interchangeably with the term "fragments of EST". These 
sequential sequences are generally comprised of between about 15 to about 45 
nucleotides and more preferably between about 20 to about 25 nucleotides in length. 
Thus any single EST of 300 nucleotides in length may provide about 280 different 
defined oligonucleotide/polynucleotide sequences of 20 nucleotides in length (e.g., 

15 20-mers). The lengths of the defined oligonucleotide/polynucleotides may be readily 
increased or decreased as desired or needed, depending on the limitations of the solid 
support on which they may be immobilized or the requirements of the hybridization 
conditions to be employed.The length is generally guided by the principle that it 
should be of sufficient length to insure that it is one average only represented once in 

20 the population to be examined. Generally, these defined 

oligonucleotide/polynucleotides are RNA or DNA and are preferably derived from 
the anti-sense strand of the EST sequence or from a corresponding mRNA sequence 
to enable their hybridization with samples of RNA or DNA. Modified nucleotides 
may be incorporated to increase stability and hybridization properties. 

25 By the term "plurality of defined oligonucleotide/polynucleotide 

sequences" is meant the following. A surface of a solid support may immobilize a 
large number of "defined oligonucleotide/polynucleotides". For example, depending 
upon the nature of the surface, it can immobilize from about 300 to upwards of 
60,000 defined 20-mer oligonucleotide/polynucleotides. It is anticipated that future 

30 improvements to solid surfaces will permit considerably larger such pluralities to be 
immobilized on a single surface. A "plurality" of sequences refers to the use on any 
one solid support of multiple different defined oligonucleotide/polynucleotides from a 
single EST from a selected library, as well as multiple different defined 
oligonucleotide/polynucleotides from different ESTs from the same library or many 

35 libraries from the same or different tissues, and may also include multiple identical 
copies of defined oligonucleotide/polynucleotides. Ultimately a pluarality has at least 
one oligonucleotide/polynucleotide per expressed gene in the entire organism For 
example, from a library producing about 5,000-10,000 ESTs, a single support can 
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include at" leasf about 1^20 defined oligonucleotide/polynucleotides representing every 
EST in that library. The composition of defined oligonucleotide^olynucleotides 
which make up a surface according to this invention may be selected or designed as 
desired. 

5 The term "sample" is employed in the description of this invention in 

several important ways. As used herein, the term "sample" encompasses any cell or 
tissue from an organism. Any desired cell or tissue type in any desired state may be 
selected to form a sample. For example, the sample cell desired may be a human T 
cell; the desired cell type for use in this invention may be a quiescent T cell or an 

10 activated T cell. 

By the phrase "analogous sample" or "analogous cell or tissue" is 
meant that according to this invention when the ESTs which provide the defined 
oligonucleotide/polynucleotides are produced from a cDNA library prepared from a 
single tissue or cell type source sample, e.g., liver tissue of a human, then the samples 

15 used to hybridize to those immobilized defined oligonucleotide/polynucleotides are 
preferably provided by the same type of sample from either a healthy or diseased 
animal, i.e., liver tissue of a healthy human and liver tissue of a diseased or infected 
human or from a human suspected of having that disease or infection. Alternatively, 
if the surface contains defined oligonucleotide/polynucleotides from multiple cells or 

20 tissues, then the "samples" which are hybridized thereto can be but are not limited to 
samples obtained from analogous multiple tissues or cells. 

By the term "detectably hybridizing" means that the sample from the 
healthy organism or diseased or infected organism is contacted with the defined 
oligonucleotide/polynucleotides on the surface for sufficient time to permit the 

25 formation of patterns of hybridization on the surfaces caused by hybridization 
between certain polynucleotide sequences in the samples with the certain immobilized 
defined oligonucleotide/polynucleotides. These patterns are made detectable by the 
use of available conventional techniques, such as fluorescent labelling of the samples. 
Preferably hybridization takes place under stringent conditions, e.g., revealing 

30 homologies of about 95%. However, if desired, other less stringent conditions may 
be selected. Techniques and conditions for hybridization at selected stringencies are 
well known in the art [see, e.g., Sambrook et al, Molecular Cl oning. A Laboratory 
Manual,, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY (1989)]. 

35 //. Compositions of The Invention 

The present invention is based upon the use of ESTs from any desired 
cell or tissue in known technologies for oligonucleotide/polynucleotide hybridization. 
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" "A:' ESTs 

An EST, as defined above, is for an animal, a sequence from a 
cDNA clone that corresponds to an mRNA. The EST sequences useful in the present 
invention axe isolated preferably from cDNA libraries using a rapid screening and 
5 sequencing technique. Custom made cDNA libraries are made using known 
techniques* See, generally, Sambrook et al, cited above. Briefly, mRNA from a 
selected cell or tissue is reverse transcribed into complementary DNA (cDNA) using 
the reverse transcriptase enzyme and made double-stranded using RNase H coupled 
with DNA polymerase or reverse transcriptase. Restriction enzyme sites are added to 
10 the cDNA and it is cloned into a vector. The result is a cDNA library. Alternatively, 
commercially available cDNA libraries may be used. Libraries of cDNA can also be 
generated from recombinant expression of genomic DNA using known techniques, 
including polymerase chain reaction-derived techniques. 

ESTs (which can range from about 150 to about 500 nucleotides in 
15 length, preferably about 300 nucleotides) can be obtained through sequence analysis 
from either end of the cDNA insert. Desirably, the DNA libraries used to obtain 
ESTs use directional cloning methods so that either the 5* end of the cDNA flikely to 
contain coding sequence) or the 3' end (likely to be a non-coding sequence) can be 
selectively obtained. 

20 In general, the method for obtaining ESTs comprises applying 

conventional automated DNA sequencing technology to screen clones, 
advantageously randomly selected clones, from a cDNA library. The cDNA libraries 
from the desired tissue can be preprocessed, or edited, by conventional techniques to 
reduce repeated sequencing of high and intermediate abundance clones and to 

25 maximize the chances of finding rare messages from specific cell populations. 
Preferably, preprocessing includes the use of defined composition prescreening 
probes, e.g., cDNA corresponding to mitochondria, abundant sequences, ribosomes, 
actins, myelin basic polypeptides, or any other known high abundance peptide. These 
prescreening probes used for preprocessing are generally derived from known ESTs. 

30 Other useful preprocessing techniques include subtraction hybridization, which 
preferentially reduces the population of highly represented sequences in the library 
[e.g., see Fargnoli et al, Anal. Biochem. . i£Z:364 (1990)] and normalization, which 
results in all sequences being represented in approximately equal proportions in the 
library [Patanjali et al, Proc. Natl. Acad. Sci. USA . £&1943 (1991)]. Additional 

35 prescreening/difFerential screening approaches are known to those skilled in the art. 

ESTs can then be generated from partial DNA sequencing of the 
selected clones. The ESTs useful in the present invention are preferably generated 
using low redundancy of sequencing, typically a single sequencing reaction. While 
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sihgle^sequericlhg 'reactions may have an accuracy as low as 90%, this nevertheless 
provides sufficient fidelity for identification of the sequence and design of PCR 
primers. 

If desired, the location of an EST in a full length cDNA is determined 
5 by analyzing the EST for the presence of coding sequence. A conventional computer 
program is used to predict the extent and orientation of the coding region of a 
sequence (using all six reading frames). Based on this information, it is possible to 
infer the presence of start or stop codons within a sequence and whether the sequence 
is completely coding or completely non-coding or a combination of the two. If start 

10 or stop codons are present, then the EST can cover both part of the S'-untranslated or 
3-untranslated part of the mRNA (respectively) as well as part of the coding 
sequence. If no coding sequence is present, it is likely that the EST is derived from 
the 3' untranslated sequence due to its longer length and the fact that most cDNA 
library construction methods are biased toward the 3' end of the mRNA. It should be 

15 understood that both coding and non-coding regions may provide ESTs equally useful 
in the described invention. 

A number of specific ESTs suitable for use in the present 
invention are described above Adams et al (supra) , which may be incorporated by 
reference herein, to describe non-essential examples of desirable ESTs. Other ESTs 

20 exist in the art which may also be useful in this invention, as will ESTs yet to be 
developed by these known techniques. 

B. Preparing the Solid Support of the Invention 

Oligonucleotide sequences which are fragments of defined 
sequence are derived from each EST by conventional means, e.g., conventional 

25 chemical synthesis or recombinant techniques. Each defined 

oligonucleotide/polynucleotide sequence as described above is a fragment, can be, but 
is not necessarily an anti-sense fragment, of an EST isolated from a DNA library 
prepared from a selected cell or tissue type from a selected animal. For use in the 
present invention, it is presently preferred that the defined 

30 oligonucleotide/polynucleotide sequences are 20-25mers. As described above, for 
each EST a number of such 20-25mers may be generated. The lengths may vary as 
described above as well as the composition. For example 

oligonucleotide/polynucleotides can be modified based on the Oligo 4.0 or simiolar 
programs to predict hybridization potential or to include modifieid nucleotides for the 

35 reasons given above. It is alos appreciated that large DNA segments may be 
employed including entire ESTs or even full length genes particular when inserted 
into cloning vectors. 
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~ A plurality of these ^defined oligonucleotide/polynucleotide 
sequences are then attached to a selected solid support conventionally used for the 
attachment of nucleotide sequences again by known means. In contrast to other 
technologies available in the art, this support is designed to contain defined, not 
5 random, ohgonucleotide/polynucleotide sequences. The EST fragments, or defined 
oligonucleotide/polynucleotide sequences, immobilized on the solid support can 
include fragments of one or more ESTs from a library of at least one selected tissue 
or cell sample of a healthy animal, at least one analogous sample of the animal having 
a disease, at least one analogous sample of the animal infected with a pathogen, and 

10 any combination thereof. 

Numerous conventional methods are employed for attaching 
biological molecules such as oligonucleotide/polynucleotide sequences to surfaces of 
a variety of solid supports. See, e.g.. Affinity Technique s. Enzvme Purification: Part 
B. Methods in Enzvmologv. Vol. 34, ed. W.B. Jakoby, M. Wilcheck, Acad. Press, 

15 NY (1974); Immobilized Biochemicals and Affinity C hromatography, Advances in 
Experimental Medicine and Biology, vol. 42, ed. R. Dunlap, Plenum Press, NY 
(1974); U. S. Patent No. 4,762,881; U. S. Patent No. 4,542,102; European Patent 
Publication No. 391,608 (October 10, 1990); U. S. Patent No. 4,992,127 (Nov. 21, 
1989). 

20 One desirable method for attaching 

oligonucleotide/polynucleotide sequences derived from ESTs to a solid support is 
described in International Application No. PCT/US90/06607 (published May 30, 
1991). Briefly, this method involves forming predefined regions on a surface of a 
solidsupport, where the predefined regions are capable of immobilizing ESTs. The 

25 methods make use of binding substances attached to the surface which enable 
selective activation of the predefined regions. Upon activation, these binding 
substances become capable of binding and immobilizing 
oligonucleotide/polynucleotides based on EST or longer gene sequences. 

Any of the known solid substrates suitable for binding 

30 oligonucleotide/polynucleotides at pre-defined regions on the surface thereof for 
hybridization and methods for attaching the oligonucleotide/polynucleotides thereto 
may be employed by one of skill in the art according to this invention. Similarly, 
known conventional methods for making hybridization of the immobilized 
oligonucleotide/polynucleotides detectable, e.g., fluorescence, radioactivity, 

35 photoactivation, biotinylation, solid state circuitry, and the like may be used in this 
invention. 

Thus, by resorting to known techniques, the invention provides 
a composition suitable for use in hybridization which consists of a surface of a solid 

11 
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support oh "which is immobilized at pre-defined regions on said surface a plurality of 
defined oligonucleotide/polynucleotide sequences for hybridization. For example, 
one composition of this invention is a solid support on which are immobilized oligos 
of EST fragments from a library constructed from a single cell type, e.g., a human 
5 stem cell, or a single tissue, e.g., human liver, from a healthy human. Still another 
composition of this invention is another solid support on which are immobilized 
oligos of EST fragments from a library constructed from a single cell type or a tissue 
from a human having a selected disease or predispositon to a selected disease, e.g M 
liver cancer. 

10 Another embodiment of the compositions of this invention 

include a single solid support having oligonucleotides of ESTs from both single cell 
or single tissue libraries from both a healthy and diseased human. Still other 
embodiments include a single support on which are immobilized oligos of EST 
fragments from more than one tissue or cell library from a healthy human or a single 

15 support on which are immobilized more than one tissue or cell library from both 
healthy and diseased animals or humans. A preferred composition of this invention is 
anticipated to be a single support containing oligos of ESTs for all known cells and 
tissues from a selected organism. 

20 ///. The Methods of the Invention 

A. Identification of Genes 

The present invention employs the compositions described 
above in methods for identifying genes which are differentially expressed in a normal 
healthy organism and an organism having a disease or infection. These methods may 

25 be employed to detect such genes, regardless of the state of knowledge about the 
function of the gene. The method of this invention by use of the compositions 
containing multiple defined EST fragments from a single gene as described above is 
able to detect levels of expression of genes or in other cases simply the expression or 
lack thereof, which differ between normal, healthy organisms and organisms having a 

30 selected disease, disorder or infection. 

One such method employs a first surface of a solid support on 
which is immobilized at pre-defined regions thereon a plurality of defined 
oligonucleotide/polynucleotide sequences, described above, of ESTor longer gene 
fragment isolated from a cDNA library prepared from at least one selected tissue or 

35 cell sample of a healthy animal (the "healthy test surface") and a second such surface 
on which is immobilized at pre-defined regions a plurality of defined 
oligonucleotide/polynucleotide sequences of ESTor longer gene fragment isolated 
from at least one analogous tissue of an animal having a selected disease (the "disease 
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tcsf surface 11 )." These test surfaces may be standardized for the selected animal or 
selected cell or tissue sample from that animal (i.e., they are prcscreened for 
polymorphisms in the species population). 

Polynucleotide sequences are then isolated from mRNA and/or 
5 cDNA from a biological sample from a known healthy animal ("healthy control") and 
a second sample is similarly prepared from a sample from a known diseased animal 
("disease sample"). These two samples are desirably selected from the cell or tissue 
analogous to that which provided the immobilized oligonucleotide/polynucleotides. 

According to the method the healthy control sample is 

10 contacted with one set of the healthy test surface and the disease test surface 
described above for a time sufficient to permit detectable hybridization to occur 
between the sample and the immobilized defined oligonucleotide/polynucleotides on 
each surface. The results of this hybridization are a first hybridization pattern formed 
between the nucleotides of healthy control and the healthy test surface and a second 

15 hybridization pattern formed between the nucleotides of healthy control sample and 
the disease test surface. 

In a similar manner, the disease sample is detectably hybridized 
to another set of healthy test and disease test surfaces, forming a third hybridization 
pattern between the disease sample and healthy test surface and a fourth hybridization 

20 pattern between the disease sample and the disease test surface. 

Comparing the four hybridization patterns permits detection of 
those defined oligonucleotide/polynucleotides which are differentially expressed 
between the healthy control and the disease sample by the presence of differences in 
the hybridization patterns at pre-defined regions. The 

25 oligonucleotide/polynucleotides on each surface which correspond to the pattern 
differences may be readily identified with the corresponding ESTor longer gene 
fragment from which the oligonucleotide/polynucleotides are obtained. 

In another embodiment of the method of this invention, the 
same process is employed, with the exception that plurality of defined 

30 oligonucleotide/polynucleotide sequences forming the healthy test sample and the 
disease test sample surfaces are immobilized on a single solid support. For example, 
each fragment of an EST or longer gene fragment on the surface is isolated from at 
least two cDNA libraries prepared from a selected cell or tissue sample of a healthy 
animal and an analogous selected cell or tissue sample of an animal having a disease. 

35 According to this embodiment, the healthy control sample is 

detectably hybridized to a copy of this single solid surface, forming one hybridization 
pattern with oligonucleotide/polynucleotides associated with both the healthy and 
diseased animal. Similarly, the disease sample is detectably hybridized to a second 
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" copy of 'this*'' single solid surface, forming one hybridization pattern with 
ligonucleotide/polynucleotides associated with both the healthy and diseased animal. 

Comparing the two hybridization patterns permits detection of 
those defined oligonucleotide/polynucleotides which are differentially expressed 
5 between the healthy control and the disease sample by the presence of differences in 
the hybridization patterns at pre-defined regions. The 
oligonucleotide/polynucleotides on each surface which conespond to the pattern 
differences may be readily identified with the corresponding ESTor longer gene 
fragment from which the oligonucleotide/polynucleotides are obtained. 
10 The identification of one or more ESTs as the source of the 

defined oligonucleotide/polynucleotide which produced a "difference" in 
hybridization patterns according to these methods permits ready identification of the 
gene from which those ESTs were derived. Because oligonuleotides are of sufficient 
length that they will hybridize under stringent conditions only with a RNA/cDNA for 
15 that gene to which they correspond, the oligo can be used to identify the EST and in 
turn the clone from which it was derived and by subsequent cloning, obtain the 
sequence of the full-length cDNA and its genomic counterparts, i.e., the gene, from 
which it was obtained. 

In other words, the ESTs identified by the method of this 

20 invention can be employed to determine the complete sequence of the mRNA, in the 
form of transcribed cDNA, by using the EST as a probe to identify a cDNA clone 
corresponding to a full-length transcript, followed by sequencing of that clone. The 
EST or the full length cDNA clone can also be used as a probe to identify a genomic 
clone or clones that contain the complete gene including regulatory and promoter 

25 regions, exons, and introns. 

It should be appreciated that one does not have to be restricted 
in using ESTs from a particular tissue from which probe RNA or cDNA is obtained, 
rather any or all ESTs (known or unknown) may be placed on the support 
Hybridization will be used a form diagnostic patterns or to identifiy which particular 

30 EST is detected. For example, all known ESTs from an organism are used to produce 
a "master" solid support to which control sample and disease samples are alternately 
hybridized. One then detects a pattern of hybridization associated with the particular 
disaease state which then forms the basis of a diagnostic test or the isolation of 
disease specific ESTs from which the intact gene may be cloned and sequenced 

35 leading uiltimately to a defined therapuetic target. 

Methods for obtaining complete gene sequences from ESTs are 
well-known to those of skill in the art. See, generally, Sambrook et al, cited above. 
Briefly, one suitable method involves purifying the DNA from the clone that was 
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sequenced to gTvfe the EST and labeling the isolated insert DNA. Suitable labeling 
systems are well known to those of skill in the art [see, eg. Basic Methods in 
Molecular Biology, L. G. Davis et al, ed., Elsevier Press, NY (1986)]. The labeled 
EST insert is then used as a probe to screen a lambda phage cDNA library or a 
5 plasmid cDNA library, identifying colonies containing clones related to the probe 
cDNA which can be purified by known methods. The ends of the newly purified 
clones are then sequenced to identify full length sequences and complete sequencing 
of full length clones is performed by enzymatic digestion or primer walking. A 
similar screening and clone selection approach can be applied to clones from a 
10 genomic DNA library. 

Additionally, an EST or gene identified by this method as 
associated with inherited disorders can be used to determine at what stage during 
embryonic development the selected gene from which it is derived is developed by 
screening embryonic DNA libraries from various stages of development, e.g. 2-cell, 
15 8-cell, etc., for the selected gene. As has been mentioned above, the invention may 
be applied in addtional temporal modes for monitoring the progression of a disease 
state, the efficacy of a particular treatment modality or the aging process of an 
individual. 

Thus, the methods of this invention permit the identification, 
20 isolation and sequencing of a gene which is differentially expressed in a selected 
disease/infection. As described in more detail below, the identified gene may then be 
employed to obtain any protein encoded thereby, or may be employed as a target for 
diagnostic methods or therapeutic approaches to the treatment of the disease, 
including, e.g., drug development. 
25 The same methods as described above for the identification of 

genes, including genes of unknown function, which are differentially expressed in a 
disease state, may also be employed to identify other genes of interest. For example, 
another embodiment of this invention includes a method for identifying a gene of a 
pathogen which is expressed in a biological sample of an animal infected with that 
30 pathogen or the gene of the host which is altered in its expression as a result of the 
infection. 

One such method employs a healthy test surface as described 
above, employing defined oligonucleotide/polynucleotides from a sample of a 
healthy, uninfected animal. The second such surface has immobilized at pre-defined 
35 regions thereon a plurality of defined oligonucleotide/polynucleotide sequences of 
ESTs isolated from at least one analogous tissue or cell sample of an infected animal 
(the "infection test surface"). Polynucleotide sequences are isolated from a biological 
sample from a healthy animal ("healthy control") and a second sample is similarly 
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" " prepared from" an animal infected with the selected pathogen ("infection sample"). 
These two samples are desirably selected from the cell or tissue analogous to that 
which provided the immobilized oligonucleotide/polynucleotides. It would also be 
possible to provide samples from the nucleic acid of the pathogen itself. 
5 According to the method the healthy control sample is 

contacted with one set of the healthy test surface and the infection test surface 
described above for a time sufficient to permit detectable hybridization to occur 
between the sample and the immobilized defined oligonucleotide/polynucleotides on 
each surface. The results of this hybridization are a first hybridization pattern formed 

10 between the nucleotides of healthy control and the healthy test surface and a second 
hybridization pattern formed between the nucleotides of healthy control sample and 
the infection test surface. 

In a similar manner, the infection sample is detectably 
hybridized to another set of healthy test and infection test surfaces, forming a third 

15 hybridization pattern between the infection sample and healthy test surface and a 
fourth hybridization pattern between the infection sample and the infection test 
surface. 

Comparing the four hybridization patterns permits detection of 
those defined oligonucleotide/polynucleotides which are differentially expressed 

20 between the healthy animal and the animal infected with the pathogen by the presence 
of differences in the hybridization patterns at pre-defined regions. As mentioned 
differential expression is not required and simple qualitative analysis is possible by 
reference to gene expression which is simply present or absent. 

A second embodiment of this method parallels the second 

25 embodiment of the method as applied to disease above, i.e., the same process is 
employed, with the exception that plurality of defined oligonucleotide/polynucleotide 
sequences forming the healthy test sample surface and the infection test sample 
surface are immobilized on a single solid support. The resulting first hybridization 
pattern (healthy control sample with healthy/infection test sample) and second 

30 hybridization pattern (infection sample with healthy/infection test sample) permits 
detection of those defined oligonucleotide/polynucleotides which are differentially 
expressed between the healthy control and the infection sample by the presence of 
differences in the hybridization patterns at pre-defined regions. The 
oligonucleotide/polynucleotides on each surface which correspond to the pattern 

35 differences may be readily identified with the corresponding ESTs from which the 
oligonucleotide/polynucleotides are obtained. 

As described above for the methods for identifying differential 
gene expression between diseased and healthy animals, the 
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oUgbnucleoticfeTpolynucleotides on each surface which correspond to the pattern 
differences may be readily identified with the corresponding ESTs from which the 
oligonucleotide/polynucleotide sequences are obtained and the genes expressed by the 
pathogen identified for similar purposes. Other embodiments of these methods may 
5 be developed with resort to the teaching herein, by altering the samples which provide 
the defined oligonucleotide/polynucleotides. For example, an EST, identified with a 
differentially expressed gene by the method of this invention is also useful in 
detecting genes expressed in the various stages of an pathogen's development, 
particularly the infective stage and following the cours of drug treatment and 

10 emergence of resistant variants. For example, employing the techniques described 
above, the EST can be used for detecting a gene in various stages of the parasitic 
Plasmodium species life cycle, which include blood stages, liver stages, and 
gametocyte stages. 

B. Diagnostic Methods 

15 In addition to use of the methods and compositions of this 

invention for identifying differentially expressed genes, another embodiment of this 
invention provides diagnostic methods for diagnosing a selected disease state, or a 
selected state resulting from aging, exposure to drugs or infection in an animal. 
According to this aspect of the invention, a first surface, described as the healthy test 

20 surface above, and a second surface, described as the disease test surface or infection 
test surface, are prepared depending on the disease or infection to be diagnosed. The 
same processes of detectable hybridization to a first and second set of these surfaces 
with the healthy control sample and disease/infection sample are followed to provide 
the four above-described hybridization patterns, i.e., healthy control sample with 

25 healthy test surface; healthy control sample with disease/infection test surface; 
disease/infection sample with healthy test surface; and disease/infection sample with 
disease/infection test surface. 

The diagnosis of disease or infection is provided by comparing 
the four hybridization patterns. Substantial differences between the first and third 

30 hybridization patterns, respectively, and the second and fourth hybridization patterns, 
respectively, indicate the presence of the selected disease or infection in said animal. 
Substantial similarities in the first and third hybridization patterns and second and 
fourth hybridization patterns indicates the absence of disease or infection. 

A similar embodiment utilizes the single surface bearing both 

35 the healthy test surface defined oligonucleotide/polynucleotides and the 
disease/infection test surface defined oligonucleotide/polynucleotides as described 
above. Parallel process steps as described above for detection of genes differentially 
expressed in disease and infected states are followed, resulting in a first hybridization 
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pattern (healthy control sample with single healthy and disease/infection test sample) 
and a second hybridization pattern (disease/infection sample with another copy of the 
single healthy and disease/infection test sample). 

Diagnosis is accomplished by comparing the two hybridization 
5 patterns, wherein substantial differences between the first and second hybridization 
patterns indicate the presence of the selected disease or infection in the animal being 
tested. Substantially similar first and second hybridization patterns indicate the 
absence of disease or infection. This like many of the foregoing embodiments may 
use known or unknown ESTs derived from many libraries. 

10 C. Other Methods of the Invention 

As is obvious to one of skill in the art upon reading this 
disclosure, the compositions and methods of this invention may also be used for other 
similar purposes. For example, the general methods and compositions may be 
adapted easily by manipulation of the samples selected to provide the standardized 

15 defined oligonucleotide/polynucleotides, and selection of the samples selected for 
hybridization thereto. One such modification is the use of this invention to identify 
cell markers of any type, e.g., markers of cancer cells, stem cell markers, and the like. 
Another modification involves the use of the method and compositions to generate 
hybridization patterns useful for forensic identification or an 'expression fingerprint 1 

20 of genes for identification of one member of a species from another. Similarly, the 
methods of this invention may be adapted for use in tissue matching for 
transplantation purposes as well as for molecular histology, i.e., to enable diagnosis of 
disease or disorders in pathology tissue samples such as biopsies. Still another use of 
this method is in monitoring the effects of development and aging upon the gene 

25 expression in a selected animal, by preparing surfaces bearing 
oligonucleotide/polynucleotides prepared from samples of standardized younger 
members of the species being tested. Additionally the patient can serve as an internal 
control by virtue of having the method applied to blood samples every 5-10 years 
during his lifetime. 

30 Still another intriguing use of this method is in the area of 

monitoring the effects of drugs on gene expression, both in laboratories and during 
clinical trials with animal, especially humans. Because the method can be readily 
adapted by altering the above parameters, it can essentially be employed to identify 
differentially expressed genes of any organism, at any stage of development, and 

35 under the influence of any factor which can affect gene expression. 
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IV. ~ The Genes and Proteins Identified 

Application of the compositions and methods of this invention as 
above described also provide other compositions, such as any isolated gene sequence 
which is differentially expressed between a normal healthy animal and an animal 
5 having a disease or infection. Another embodiment of this invention is any isolated 
pathogen gene sequence which is expressed in tissue or cell samples of an infected 
animal- Similarly an embodiment of this invention is any gene sequence identified by 
the methods described herein. 

These gene sequences may be employed in conventional methods to 

10 produce isolated proteins encoded thereby. To produce a protein of this invention, 
the DNA sequences of a desired gene identified by the use of the methods of this 
invention or portions thereof are inserted into a suitable expression system. 
Desirably, a recombinant molecule or vector is constructed in which the 
polynucleotide sequence encoding the protein is operably linked to a heterologous 

15 expression control sequence permitting expression of the human protein. Numerous 
types of appropriate expression vectors and host cell systems are known in the art for 
mammalian (including human) expression, insect, e.g., baculovirus expression, yeast, 
fungal, and bacterial expression, by standard molecular biology techniques. 

The transfection of these vectors into appropriate host cells, whether 

20 mammalian, bacterial, fungal, or insect, or into appropriate viruses, can result in 
expression of the selected proteins. Suitable host cells or cell lines for transfection, 
and viruses, as well as methods for the construction and transfection of such host cells 
and viruses are well-known. Suitable methods for transfection, culture, amplification, 
screening, and product production and purification are also known in the art, 

25 The genes and proteins identified by this invention can be employed, if 

desired in diagnostic compositions useful for the diagnosis of a disease or infection 
using conventional diagnostic assays. For example, a diagnostic reagent can be 
developed which detectably targets a gene sequence or protein of this invention in a 
biological sample of an animal. Such a reagent may be a complementary nucleotide 

30 sequence, an antibody (monoclonal, recombinant or polyclonal), or a chemically 
derived agonist or antagonist. Alternatively, the proteins and polynucleotide 
sequences of this invention, fragments of same, or complementary sequences thereto, 
may themselves be useful as diagnostic reagents for diagnosing disease states with 
which the ESTs of the invention are associated. These reagents may optionally be 

35 labelled using diagnostic labels, such as radioactive labels, colorimetric enzyme label 
systems and the like conventionally used in diagnostic or therapeutic methods, e.g, 
Northern and Western blotting, antigen-antibody binding and the like. The selection 
of the appropriate assay format and label system is within the skill of the art and may 
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— -readily be chosen without requiring addition^ explanation by resort to the wealth of 
art in the diagnostic area. 

Additionally, genes and proteins identified according to this invention 
may be used therapeutically. For example, the EST-containing gene sequences may 
5 be useful in gene therapy, to provide a gene sequence which in a disease is not 
properly or sufficiently expressed In such a method, a selected gene sequence of this 
invention is introduced into a suitable vector or other delivery system for delivery to a 
cell containing a defect in the selected gene. Suitable delivery systems are well 
known to those of skill in the art and enable the desired EST or gene to be 

10 incorporated into the target cell and to be translated by the cell. The EST or gene 
sequence may be introduced to mutate the existing gene by recombination or provide 
an active copy thereof in addition to the inactive gene to replace its function. 

Alternatively, a protein encoded by an EST or gene of the invention 
may be useful as a therapeutic reagent for delivery of a biologically active protein, 

15 particularly when the disease state is associated with a deficiency of this protein. 
Such a protein may be incorporated into an appropriate therapeutic formulation, alone 
or in combination with other active ingredients. Methods of formulating such 
therapeutic compositions, as well as suitable pharmaceutical carriers, and the like, are 
well known to those of skill in the an. Still an additional method of delivering the 

20 missing protein encoded by an EST, or the gene from which a selected EST was 
derived, involves expressing it directly in vivo. Systems for such in vivo expression 
are well known in the art 

Yet another use of the ESTs, genes identified according to the methods 
of this invention, or the proteins encoded thereby is a target for the screening and 

25 development of natural or synthetic chemical compounds which have utility as 
therapeutic drugs for the treatment of disease states associated with the identified 
genes and ESTs derived therefrom. As one example, a compound capable of binding 
to such a protein encoded by such a gene and either preventing or enhancing its 
biological activity may be a useful drug component for the treatment or prevention of 

30 such disease states. 

Conventional assays and techniques may be used for the screening and 
development of such drugs. As one example, a method for identifying compounds 
which specifically bind to or inhibit or activate proteins encoded by these gene 
sequences can include simply the steps of contacting a selected protein or gene 

35 product, with a test compound to permit binding of the test compound to the protein; 
and determining the amount of test compound, if any, which is bound to the protein. 
Such a method may involve the incubation of the test compound and the protein 
immobilized on a solid support. Still other conventional methods of drug screening 

20 



WO 95/21944 



PCT7US95/01863 



can -involve employing a suitable computer pibgram to determine compounds having 

similar or complementary chemical structures to that of the gene product or portions 
thereof and screening those compounds either for competitive binding to the protein 
to detect enhanced or decreased activity in the presence of the selected compound. 

5 Thus, through use of such methods, the present invention is anticipated 

to provide compounds capable of interacting with these genes, ESTs, or encoded 
proteins, or fragments thereof, and either enhancing or decreasing the biological 
activity, as desired. Such compounds are believed to be encompassed by this 
invention. 

10 Numerous modifications and variations of the present invention are 

included in the above-identified specification and are expected to be obvious to one of 
skill in the art. Such modifications and alterations to the compositions and processes 
of the present invention are believed to be encompassed in the scope of the claims 
appended hereto. 

15 
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" "WHAT IS OLATED IS: 

1. A method for identifying genes which are differentially expressed in 
two different pre-determined states of an organism comprising: 
5 a. providing a first surface on which is immobilized at pre-defined 

regions on said surface a plurality of defined oligonucleotide/polynucleotide 
sequences, each sequence selected from the group consisting of a fragment of an EST, 
an entire EST a fragment of a gene or an entire gene, isolated from a DNA library 
prepared from at least one selected cell, tissue, organ or organism sample in a first 
10 state and present in excess relative to the polynucleotide to be hybridized; 

b. providing a second surface on which is immobilized at pre-defined 
regions on said surface a plurality of defined oligonucleotide/polynucleotide 
sequences, each sequence selected from the group consisting of a fragment of an EST, 
an entire EST a fragment of a gene or an entire gene, isolated from a DNA library 

15 prepared from at least one selected cell, tissue, organ or organism sample in a second 
state and present in excess relative to the polynucleotide to be hybridized; 

c. detectably hybridizing to a set of said first and second surfaces 
polynucleotide sequences isolated from a sample from a said organism in said first 
state, said sample selected from sources analogous to the sources of step (a), said 

20 hybridization sufficient to form a first and second hybridization pattern on each said 
first and second surface, 

& detectably hybridizing to a set of said first and second surfaces 
polynucleotide sequences isolated from a sample from said organism in said second 
state, said sample selected from sources analogous to the sources of step (c), said 

25 hybridization sufficient to form a third and fourth hybridization pattern on each said 
first and second surface, 

e. comparing at least two of the four hybridization patterns, 
wherein genes differentially expressed in said first and second states are identified by 
the presence of differences in the hybridization patterns at pre-defined regions; 

30 f. identifying the oligonucleotide/polynucleotides on each surface 

which correspond to said pattern differences and the corresponding ESTs or larger 
gene fragment from which the oligonucleotide/polynucleotides were obtained, 
whereby identification of the EST or larger gene fragment permits identification of 
the gene from which the ESTs or larger gene fragment were derived. 

35 
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~ " ' ' 2. The riiethod "according to Claim 1 wherein said first and second states are 
respectively healthy and disease; pathogen uninfected and pathogen infected; a first 
progression state and a second progression of a disease or infection; a first treatment 
state and a second treatment state of a disease or infection; or a first developmental 
5 and a second developmental state. 

3. The method according to Claim 1 wherein said organism is a plant or an 

animal. 

10 4 - The method according to Claim 3 wherein said aniaml is a human. 

5. A method for identifying genes which are differentially expressed in a 
normal healthy animal and an animal having a disease comprising: 

a. providing a first surface on which is immobilized at pre- 
15 defined regions on said surface a plurality of defined oligonucleotide/polynucleotide 

sequences, each sequence each sequence selected from the group consisting of a 
fragment of an EST, an entire EST a fragment of a gene or an entire gene, isolated 
from a DNA library prepared from at least one selected cell, tissue, organ or organism 
sample in a healthy animal and present in excess relative to the polynucleotide to be 
20 hybridized; 

b. providing a second surface on which is immobilized at pre- 
defined regions of said surface a plurality of defined oligonucleotide/polynucleotide 
sequences, each sequence each sequence selected from the group consisting of a 
fragment of an EST, an entire EST a fragment of a gene or an entire gene, isolated 

25 from a DNA library prepared from at least one selected cell, tissue, organ or organism 
sample from an animal having said disease and present in excess relative to the 
polynucleotide to be hybridized; 

c. detectably hybridizing to a set of said first and second surfaces 
polynucleotide sequences isolated from a sample from a healthy animal, said sample 

30 selected from sources analogous to the sources of step (a), said hybridization 
sufficient to form a first and second hybridization pattern on each said first and 
second surface, said sample selected from a cell or tissue sample analogous to the 
sample of step (a), said hybridization sufficient to form a first and second 
hybridization pattern on each said first and second surface; 
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~ " *tf . detectably hybridizing tb a set of said first and second surfaces 

polynucleotide sequences isolated from a sample from an animal having said disease, 
said sample selected from a cell or tissue sample analogous to the sample of step (c), 
said hybridization sufficient to form a third and fourth hybridization pattern on each 
5 said first and second surface, 

e. comparing at least two of the four hybridization patterns, 
wherein genes differentially expressed in said first and second states are identified by 
the presence of differences in the hybridization patterns at pre-defined regions; 

f. identifying the oligonucleotide/polynucleotides on each surface 
10 which correspond to said pattern differences and the corresponding ESTs or larger 

gene fragment from which the oligonucleotide/polynucleotides were obtained, 
whereby identification of the EST or larger gene fragment permits identification of 
the gene from which the ESTs or larger gene fragment were derived. 

15 6. A method for identifying genes which are differentially expressed in a 

normal healthy animal and an animal having a disease comprising: 

a. providing a surface on which is immobilized at pre-defined 
regions on said surface a plurality of defined oligonucleotide/polynucleotide 
sequences, each sequence selected from the group consisting of a fragment of an EST, 

20 an entire EST a fragment of a gene or an entire gene isolated from a DNA library 
prepared from the group selected from at least one selected cell, tissue, organ or 
organism sample in of a healthy animal and an analogous selected sample of an 
animal having said disease and both present in excess relative to the polynucleotide to 
be hybridized; 

25 b. detectably hybridizing to a first copy of said surface 

polynucleotide sequences isolated from a healthy animal, said sample selected from a 
cell or tissue sample analogous to the sample of step (a), said hybridization sufficient 
to form a first hybridization pattern on said surface; 

c. detectably hybridizing to a second copy of said surface 
30 polynucleotide sequences isolated from an animal having said disease, said sample 

selected from a cell or tissue sample analogous to the sample of step (a), said 
hybridization sufficient to form a second hybridization pattern on said surface; 

d. comparing the two hybridization patterns, wherein genes 
differentially expressed in a disease state are identified by the presence of differences 

35 in the hybridization patterns at pre-defined regions; 
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~ ~~~ e - " identifying the oligonucleotide/polynucleotides on each surface 
which correspond to said pattern differences and the corresponding ESTs from which 
the oligonucleotide/polynucleotides are obtained, whereby identification of the EST 
permits identification of the gene from which the ESTs were derived. 

7. A method for identifying a gene of a pathogen which is expressed in a 
biological sample of an animal infected with said pathogen comprising: 

a. providing a first surface on which is immobilized at pre- 
defined regions on said surface a plurality of defined oligonucleotide/polynucleotide 
sequences, each sequence selected from the group consisting of a fragment of an EST, 
an entire EST a fragment of a gene or an entire gene isolated from a DNA library 
prepared from at least one selected cell, tissue, organ or organism sample of a 
healthy, uninfected animal and present in excess relative to the polynucleotide to be 
hybridized; 

^ b. providing a second surface on which is immobilized at pre- 

defined regions of said surface a plurality of defined oHgonucleotide/polynucleotide 
sequences, each sequence selected from the group consisting of a fragment of an EST, 
an entire EST a fragment of a gene or an entire gene isolated from at least one 
selected cell, tissue, organ or organism sample of an infected animal; 

20 c. detectably hybridizing to a set of said first and second surfaces 

polynucleotide sequences isolated from a sample from a healthy animal, said sample 
selected from a cell or tissue sample analogous to the sample of step (a), said 
hybridization sufficient to form first and second hybridization patterns on each said 
first and second surface, 

25 d. detectably hybridizing to a set of said first and second surfaces 

polynucleotide sequences isolated from a sample from an infected animal, said 
sample selected from a cell or tissue sample analogous to the sample of step (a), said 
hybridization sufficient to form third and fourth hybridization patterns on each said 
first and second surface, 

30 e. comparing the four hybridization patterns, wherein genes of 

said pathogen which are expressed in an infected animal are identified by the 
presence of differences in the hybridization patterns at pre-defined regions; 

f. identifying the oligonucleotide/polynucleotides on each surface 
which correspond to said pattern differences and the corresponding ESTs from which 

35 the oligonucleotide/polynucleotides are obtained, whereby identification of the EST 
permits identification of the gene from which the ESTs were derived. 
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, method for identifying a gend of a pathogen which is expressed in a 

biological sample of an animal infected with said pathogen comprising: 

a. providing a surface on which is immobilized at pre-defined 
regions on said surface a plurality of defined oligonucleotide/polynucleotide 

5 sequences, each sequence selected from the group consisting of a fragment of an EST, 
an entire EST a fragment of a gene or an entire gene isolated from a DNA library 
prepared from the group selected from at least one selected cell, tissue, organ or 
organism sample in of a healthy animal and an analogous selected sample of an 
animal having said disease and both present in excess relative to the polynucleotide to 
10 be hybridized 

b. detectably hybridizing to a first copy of said surface 
polynucleotide sequences isolated from a sample from a healthy animal, said sample 
selected from a cell or tissue sample analogous to the sample of step (a), said 
hybridization sufficient to form a first hybridization pattern on said surface; 

IS c. detectably hybridizing to a second copy of said surface 

polynucleotide sequences isolated from a sample from an infected animal, said 
sample selected from a cell or tissue sample analogous to the sample of step (a), said 
hybridization sufficient to form a second hybridization pattern on said surface; 

cL comparing the two hybridization patterns, wherein genes of 

20 said pathogen which are expressed in an infected animal are identified by the 
presence of differences in the hybridization patterns at pre-defined regions; 

e. identifying the oligonucleotide/polynucleotides on each surface 
which correspond to said pattern differences and the corresponding ESTs from which 
the oligonucleotide/polynucleotides are obtained, whereby identification of the EST 

25 permits identification of the gene from which the ESTs were derived. 

9. A composition suitable for use in hybridization comprising a solid 
surface on which is immobilized at pre-defined regions on said surface a plurality of 
defined oligonucleotide/polynucleotide sequences for hybridization, each sequence 

30 selected from the group consisting of a fragment of an EST, an entire EST a fragment 
of a gene or an entire gene isolated from a DNA library prepared from the group 
selected from at least one selected cell, tissue, organ or organism sample of a healthy 
animal, at least one analogous sample of said animal having a disease, at least one 
analogous sample of said animal infected with a microbial pathogen, and any 

35 combination thereof. 
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10. "Ah isolated gene sequence which is differentially expressed in a 
normal healthy animal and an animal having a disease, identified by the method of 
claim 1. 

5 11. An isolated pathogen gene sequence which is expressed in tissue or 

cell samples of an infected animal identified by the method of claim 7. 

12. A diagnostic composition useful for the diagnosis of a disease 
comprising a reagent capable of detectably targeting a gene sequence of claim 10 in a 

10 biological sample of an animal. 

13. A diagnostic composition useful for the diagnosis of infection by a 
pathogen comprising a reagent capable of detectably targeting a gene sequence of 
claim 1 1 in a biological sample of an animal. 



15 



14. An isolated protein produced by expression of a gene sequence of 
claim 10. 



15. An isolated pathogen protein produced by expression of a gene 
20 sequence of claim 11. 



16. A therapeutic composition comprising a protein or fragment thereof 
selected from the group consisting of a protein of claim 10 and a protein of claim 15. 

25 17. A method for diagnosing a selected disease or infection in an animal 

comprising: 

a. providing a first surface on which is immobilized at pre- 
defined regions on said surface a plurality of defined oligonucleotide/polynucleotide 
sequences, each sequence selected from the group consisting of a fragment of an EST, 

30 an entire EST a fragment of a gene or an entire gene, isolated from a DNA library 
prepared from at least one selected cell, tissue, organ or organism sample of a healthy 
animal and present in excess relative to the polynucleotide to be hybridized; 

b. providing a second surface on which is immobilized at pre- 
defined regions of said surface a plurality of defined oligonucleotide/polynucleotide 

35 sequences, each sequence comprising a fragment of an EST isolated from at least one 
said tissue of an animal having said disease; 
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* ~ "C:** - detectably hybridizing to a "Set of said first and second surfaces 
polynucleotide sequences isolated from a DNA library preparedfrom a sample from a 
healthy animal, said sample selected from a cell or tissue sample analogous to the 
sample of step (a), said hybridization sufficient to form a first and second 
5 hybridization pattern on each said first and second surface; 

d. detectably hybridizing to a set of said first and second surfaces 
polynucleotide sequences isolated from a DNA library prepared from a sample from 
an animal having said disease, said sample selected from a cell or tissue sample 
analogous to the sample of step (c), said hybridization sufficient to form a third and 

10 fourth hybridization pattern on each said first and second surface; 

e. comparing the four hybridization patterns, wherein substantial 
differences between the first and third hybridization patterns and the second and 
fourth hybridization patterns indicates the presence of said selected disease or 
infection in said animal, and substantial similarities in said first and third 

15 hybridization patterns and second and fourth hybridization patterns indicates the 
absence of disease or infection. 



18. A method for diagnosing a selected disease or infection in an animal 
comprising: 

20 a. providing a surface on which is immobilized at pre-defined 

xegions on said surface a plurality of defined oligonucleotide/polynucleotide 
sequences, each sequence comprising a fragment of an EST isolated from a DNA 
library prepared from the group consisting of a selected cell or tissue sample of a 
healthy animal and an analogous selected eel] or tissue sample of an animal having 

25 said disease; 

b. detectably hybridizing to a first copy of said surface 
polynucleotide sequences isolated from a sample from a healthy animal, said sample 
selected from a cell or tissue sample analogous to the sample of step (a), said 
hybridization sufficient to form a first hybridization pattern on said surface; 

30 c. detectably hybridizing to a second copy of said surface 

polynucleotide sequences isolated from a DNA library prepared from a sample from 
an animal having said disease, said sample selected from a cell or tissue sample 
analogous to the sample of step (a), said hybridization sufficient to form a second 
hybridization pattern on said surface; 

35 d. comparing the two hybridization patterns, wherein substantial 

differences between the first and second hybridization patterns indicates the presence 
of said selected disease or infection in said animal, and substantial similarities in said 
first and second hybridization patterns indicates the absence of disease or infection. 
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COMPARATIVE GENE TRANSCRIPT ANALYSIS 
1. FIELD OP INVENTION 

The present invention is in the field of molecular 
biology and computer science; more particularly, the 
5 present invention describes methods of analyzing gene 

transcripts and diagnosing the genetic expression of cells 
and tissue. 

2. BACKGROUND OF THE INVENTION 
Until very recently, the history of molecular biology 
10 has been written one gene at a time. Scientists have 
observed the cell's physical changes, isolated mixtures 
from the cell or its milieu, purified proteins, sequenced 
proteins and therefrom constructed probes to look for the 
- corresponding gene. 

Recently, different nations have set up massive 
projects to sequence the billions of bases in the human 
genome. These projects typically begin with dividing the 
genome into large portions of chromosomes and then 
determining the sequences of these pieces, which aire then 
analyzed for identity with known proteins or portions 
thereof, known as motifs. Unfortunately, the majority of 
genomic DNA does not encode proteins and though it is 
postulated to have some effect on the cell's ability to 
make protein, its relevance to medical applications is not 
25 understood at this time. 

A third methodology involves sequencing only the 
transcripts encoding the cellular machinery actively 
involved in making protein, namely the mRNA. The advantage 
is that the cell has already edited out all the non-coding 
30 DNA, and it is relatively easy to identify the protein- 
coding portion of the RNA. The utility of this approach 
was not immediately obvious to genomic researchers, in 
fact, when cDNA sequencing was initially proposed, the 
method was roundly denounced by those committed to genomic 
35 sequencing. For example, the head of the U.S. Human Genome 
project discounted CDNA sequencing as not valuable and 
refused to approve funding of projects. 

In this disclosure, we teach methods for analyzing 
DNA, including cDNA libraries. Based on our analyses and 
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research, we see each individual gene product as a "pixel" 
of information, which relates to the expression of that, 
and only that, gene. We teach herein, methods whereby the 
individual "pixels" of gene expression information can be 
5 combined into a single gene transcript "image," in which 
each of the individual genes can be visualized 
simultaneously and allowing relationships between the gene 
pixels to be easily visualized and understood. 

We further teach a new method which we call electronic 
10 subtraction. Electronic subtraction will enable the gene 
researcher to turn a single image into a moving picture, 
one which describes the temporality or dynamics of gene 
expression, at the level of a cell or a whole tissue, it 
is that sense of "motion" of cellular machinery on the 
15 scale of a cell or organ which constitutes the new 

invention herein. This constitutes a new view into the 
process of living cell physiology and one which holds great 
promise to unveil and discover new therapeutic and 
diagnostic approaches in medicine. 
20 We teach another method which we call "electronic 

northern," which tracks the expression of a single gene 
across many types of cells and tissues. 

Nucleic acids (DNA and RNA) carry within their 
sequence the hereditary information and are therefore the 
25 prime molecules of life. Nucleic acids are found in all 

living organisms including bacteria, fungi, viruses, plants 
and animals. It is of interest to determine the relative 
abundance of different discrete nucleic acids in different 
cells, tissues and organisms over time under various 
30 conditions, treatments and regimes. 

All dividing cells in the human body contain the same 
set of 23 pairs of chromosomes, it is estimated that these 
autosomal and sex chromosomes encode approximately 100,000 
genes. The differences among different types of cells are 
35 believed to reflect the differential expression of the 
100,000 or so genes. Fundamental questions of biology 
could be answered by understanding which genes are 
transcribed and knowing the relative abundance of 
transcripts in different cells. 



2 



WO 95/20681 PCTAJS9S/01160 

Previously, the art has only provided for the analysis 
f . a f ^ -££ own g enes at a time by standard_molecular 
biology techniques such as PCR, northern blot analysis, or 
other types of DNA probe analysis such as in situ 
5 hybridization. Each of these methods allows one to analyze 
the transcription of only known genes and/or small numbers 
of genes at a time. Nucl. Acids Res. 19, 7 097-7104 (1991) ; 
Nucl. Acids Res. 18, 4833-42 (1990); Nucl. Acids Res. 18/ 
2789-92 (1989); European J. Neuroscience 2, 1063-1073 
10 (1990) ; Analytical Biochem. 187 . 364-73 (1990) ; Genet. 
Annals Techn. Appl. 7, 64-70 (1990); GATA 8(4), 129-33 
(1991); Proc. Natl. Acad. Sci. USA 85/ 1696-1700 (1988); 
Nucl. Acids Res. 19, 1954 (1991); Proc. Natl. Acad. Sci. 
USA 88, 1943-47 (1991); Nucl. Acids Res. 19, 6123-27 
15 (1991); Proc. Natl. Acad. Sci. USA 85/ 5738-42 (1988); 
Nucl. Acids Res. 16, 10937 (1988). 

Studies of the number and types of genes whose 
transcription is induced or otherwise regulated during cell 
processes such as activation, differentiation, aging, viral 
20 transformation, morphogenesis, and mitosis have been 

pursued for many years, using a variety of methodologies. 
One of the earliest methods was to isolate and analyze 
levels of the proteins in a cell, tissue, organ system, or 
even organisms both before and after the process of 
25 interest. One method of analyzing multiple proteins in a 
sample is using 2-dimensional gel electrophoresis, wherein 
proteins can be, in principle, identified and quantified as 
individual bands, and ultimately reduced to a discrete 
signal. At present, 2-dimensional analysis only resolves 
30 approximately 15% of the proteins. m order to positively 
analyze those bands which are resolved, each band must be 
excised from the membrane and subjected to protein sequence 
analysis using Edman degradation. Unfortunately, most of 
the bands were present in quantities too small to obtain a 
35 reliable sequence, and many of those bands contained more 
than one discrete protein. An additional difficulty is 
that many of the proteins were blocked at the 
amino-terminus, further complicating the sequencing 
process . 
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Analyzing differentiation at the gene transcription 
l^Vel "has" overcome many of these disadvantages and 
drawbacks, since the power of recombinant DNA technology 
allows amplification of signals containing very small 
5 amounts of material. The most common method, called 
"hybridization subtraction," involves isolation of mRNA 
from the biological specimen before (B) and after (A) the 
developmental process of interest, transcribing one set of 
mRNA into cDNA, subtracting specimen B from specimen A 
10 (mRNA from cDNA) by hybridization, and constructing a cDNA 
library from the non-hybridizing mRNA fraction. Many 
different groups have used this strategy successfully, and 
a variety of procedures have been published and improved 
upon using this same basic scheme. Nucl. Acids Res 19 
15 7097-7104 (1991); Nucl. Acids Res. 18, 48 33-42 (1990)- ' 
• Nucl. Acids Res. 18, 2789-92 (1989); European J. 
Neuroscience 2, 1063-1073 (1990); Analytical Biochem. J187, 
364-73 (1990); Genet. Annals Techn. Appl. 2 , 64-70 (1990)- 
GATA!(4), 129-33 (1991); Proc. Natl. Acad. Sci. USA 8*5 ' 
1696-1700 (1988); Nucl. Acids Res. 19, 1954 (1991); Proc 
Natl. Acad. sci. USA 18, 19 43-47 (1991); Nucl. Acids Res. 
11, 6123-27 (1991); Proc. Natl. Acad. Sci. USA M, 5738-42 
(1988); Nucl. Acids Res. 16, 10937 (1988). 

Although each of these techniques have particular 
strengths and weaknesses, there are still some limitations 
and undesirable aspects of these methods: First, the time 
and effort required to construct such libraries is quite 
large. Typically, a trained molecular biologist might 
expect construction and characterization of such a library 
to require 3 to 6 months, depending on the level of skill 
experience, and luck. Second, the resulting subtraction ' 
libraries are typically inferior to the libraries 
constructed by standard methodology, a typical 
conventional cDNA library should have a clone complexity of 
at least 10 6 clones, and an average insert size of 1-3 kB 
in contrast, subtracted libraries can have complexities of 
10 or 10 3 and average insert sizes of 0.2 kB. Therefore 
there can be a significant loss of clone and sequence 
information associated with such libraries. Third, this 
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approach allows the researcher to capture only the genes 
induced in specimen A relative to specimen_B, not 
vice-versa, nor does it easily allow comparison to a third 
specimen of interest (C) . Fourth, this approach requires 
5 very large amounts (hundreds of micrograms) of "driver « 
mRNA (specimen B) , which significantly limits the number 
and type of subtractions that are possible since many 
tissues and cells are very difficult to obtain in large 
quantities . 

Fifth, the resolution of the subtraction is dependent 
upon the physical properties of DNA : DNA or RNA : DNA 
hybridization. The ability of a given sequence to find a 
hybridization match is dependent on its unique CoT value. 
The CoT value is a function of the number of copies 
(concentration) of the particular sequence, multiplied by 
the time of hybridization, it follows that for sequences 
which are abundant, hybridization events will occur very 
rapidly (low CoT value) , while rare sequences will form 
duplexes at very high CoT values. CoT values which allow 
such rare sequences to form duplexes and therefore be 
effectively selected are difficult to achieve in a 
convenient time frame. Therefore, hybridization 
subtraction is simply not a useful technique with which to 
study relative levels of rare mRNA species, sixth, this 
problem is further complicated by the fact that duplex 
formation is also dependent on the nucleotide base 
composition for a given sequence. Those sequences rich in 
G + c form stronger duplexes than those with high contents 
of A + T. Therefore, the former sequences will tend to be 
removed selectively by hybridization subtraction. Seventh, 
it is possible that hybridization between nonexact matches' 
can occur. When this happens, the expression of a 
homologous gene may "mask" expression of a gene of 
interest, artificially skewing the results for that 
35 particular gene. 

Matsubara and Okubo proposed using partial cDNA 
sequences to establish expression profiles of genes which 
could be used in functional analyses of the human genome. 
Matsubara and Okubo warned against using random priming, as 
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it creates multiple unique DNA fragments from individual 
nRNAs anar-may thus skew the analysis of the number of 
particular mRNAs per library. They sequenced randomly 
selected members from a 3 '-directed cDNA library and 
5 established the frequency of appearance of the various 
ESTs. They proposed comparing lists of ESTs from various 
cell types to classify genes. Genes expressed in many 
different cell types were labeled housekeepers and those 
selectively expressed in certain cells were labeled cell- 
10 specific genes, even in the absence of the full sequence of 
the gene or the biological activity of the gene product. 

The present invention avoids the drawbacks of the 
prior art by providing a method to quantify the relative 
abundance of multiple gene transcripts in a given 
biological specimen by the use of high-throughput 
sequence-specific analysis of individual RNAs and/or their 
corresponding cDNAs. 

The present invention offers several advantages over 
current protein discovery methods which attempt to isolate 
individual proteins based upon biological effects. The 
method of the instant invention provides for detailed 
diagnostic comparisons of cell profiles revealing numerous 
changes in the expression of individual transcripts. 

The instant invention provides several advantages over 
current subtraction methods including a more complex 
library analysis (io« to io 7 clones as compared to io 3 
clones) which allows identification of low abundance 
messages as well as enabling the identification of messages 
which either increase or decrease in abundance. These 
large libraries are very routine to make in contrast to the 
libraries of previous methods, m addition, homologues can 
easily be distinguished with the method of the instant 
invention. 

This method is very convenient because it organizes a 
large quantity of data into a comprehensible, digestible 
format. The most significant differences are highlighted 
by electronic subtraction. m depth analyses are made more 
convenient. 
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_ The P^ sen ^ invention provides several advantages over 

previous methods of electronic analysis of— cDNA. The 
method is particularly powerful when more than 100 and 
preferably more than 1,000 gene transcripts are analyzed. 
5 In such a case, new low-frequency transcripts are 
discovered and tissue typed. 

High resolution analysis of gene expression can be 
used directly as a diagnostic profile or to identify 
disease-specific genes for the development of more classic 
10 diagnostic approaches. 

This process is defined as gene transcript frequency 
analysis. The resulting quantitative analysis of the gene 
transcripts is defined as comparative gene transcript 
analysis. 



15 



20 



25 



3. SUMMARY OF THE INVENTION 

The invention is a method of analyzing a specimen 
containing gene transcripts comprising the steps of (a) 
producing a library of biological sequences; (b) generating 
a set of transcript sequences, where each of the transcript 
sequences in said set is indicative of a different one of 
the biological sequences of the library; (c) processing the 
transcript sequences in a programmed computer (in which a 
database of reference transcript sequences indicative of 
reference sequences is stored) , to generate an identified 
sequence value for each of the transcript sequences, where 
each said identified sequence value is indicative of 
sequence annotation and a degree of match between one of 
the biological sequences of the library and at least one of 
the reference sequences; and (d) processing each said 
identified sequence value to generate final data values 
indicative of the number of times each identified sequence 
value is present in the library. 

The invention also includes a method of comparing two 
specimens containing gene transcripts. The first specimen 
35 is processed as described above. The second specimen is 
used to produce a second library of biological sequences, 
which is used to generate a second set of transcript 
sequences, where each of the transcript sequences in the 
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s cond set is indicative of one of the biological sequences 
of "the" second library. Then the second set of transcript 
sequences is processed in a programmed computer to generate 
a second set of identified sequence values, namely the 
5 further identified sequence values, each of which is 

indicative of a sequence annotation and includes a degree 
of match between one of the biological sequences of the 
second library and at least one of the reference sequences. 
The further identified sequence values are processed to 
generate further final data values indicative of the number 
of times each further identified sequence value is present 
in the second library. The final data values from the 
first specimen and the further identified sequence values 
from the second specimen are processed to generate ratios 
of transcript sequences, which indicate the differences in 
the number of gene transcripts between the two specimens. 

In a further embodiment, the method includes 
quantifying the relative abundance of roRNA in a biological 
specimen by (a) isolating a population of mRNA transcripts 
from a biological specimen; (b) identifying genes from 
which the mRNA was transcribed by a sequence-specific 
method; (c) determining the numbers of mRNA transcripts 
corresponding to each of the genes; and (d) using the mRNA 
transcript numbers to determine the relative abundance of 
mRNA transcripts within the population of mRNA transcripts. 

Also disclosed is a method of producing a gene 
transcript image analysis by first obtaining a mixture of 
mRNA, from which cDNA copies are made. The cDNA is 
inserted into a suitable vector which is used to transfect 
suitable host strain cells which are plated out and 
permitted to grow into clones, each cone representing a 
unique mRNA. A representative population of clones 
transfected with cDNA is isolated. Each clone in the 
population is identified by a sequence-specific method 
35 which identifies the gene from which the unique mRNA was 
transcribed. The number of times each gene is identified 
to a clone is determined to evaluate gene transcript 
abundance. The genes and their abundances are listed in 
order of abundance to produce a gene transcript image. 
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In a further embodiment, the relative abundance of the 
gene transcripts in one cell type or tissue_ is compared 
with the relative abundance of gene transcript numbers in a 
second cell type or tissue in order to identify the 
5 differences and similarities. 

In a further embodiment, the method includes a system 
for analyzing a library of biological sequences including a 
means for receiving a set of transcript sequences, where 
each of the transcript sequences is indicative of a 
10 different one of the biological sequences of the library; 
and a means for processing the transcript sequences in a 
computer system in which a database of reference transcript 
sequences indicative of reference sequences is stored, 
wherein the computer is programmed with software for 
generating an identified sequence value for each of the 
transcript sequences, where each said identified sequence 
value is indicative of a sequence annotation and the degree 
of match between a different one of the biological 
sequences of the library and at least one of the reference 
sequences, and for processing each said identified sequence 
value to generate final data values indicative of the 
number of times each identified sequence value is present 
in the library. 

In essence, the invention is a method and system for 
quantifying the relative abundance of gene transcripts in a 
biological specimen. The invention provides a method for 
comparing the gene transcript image from two or more 
different biological specimens in order to distinguish 
between the two specimens and identify one or more genes 
30 which are differentially expressed between the two 
specimens. Thus, this gene transcript image and its 
comparison can be used as a diagnostic. One embodiment of 
the method generates high-throughput sequence-specific 
analysis of multiple RNAs or their corresponding cDNAs: a 
35 gene transcript image. Another embodiment of the method 

produces the gene transcript imaging analysis by the use of 
high-throughput cDNA sequence analysis. in addition, two 
or more gene transcript images can be compared and used to 
detect or diagnose a particular biological state, disease, 
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_ 0r _ COndition Which is corr elated to the relative abundance 
" of 9e ne transcripts in a given cell or population of cells. 

4 « DESCRIPT ION OF THE TABLES AND Dft&WTMflfi 
4.1. TABLES 

5 Ta ble l presents a detailed explanation of the letter 

codes utilized in Tables 2-5. 

Table 2 lists the one hundred most common gene 
transcripts. it is a partial list of isolates from the 
HUVEC cDNA library prepared and sequenced as described 
10 below. The left-hand column refers to the sequence's order 
of abundance in this table. The next column labeled 
"number" is the clone number of the first HUVEC sequence 
identification reference matching the sequence in the 
"entry" column number. Isolates that have not been 
15 sequenced are not present in Table 2. The next column, 

labeled "N", indicates the total number of cDNAs which have 
the same degree of match with the sequence of the reference 
transcript in the "entry" column. 

The column labeled "entry" gives the NIH GENBANK locus 
20 name, which corresponds to the library sequence numbers. 
The "s" column indicates in a few cases the species of the 
reference sequence. The code for column "s« is given in 
Table 1. The column labeled "descriptor" provides a plain 
English explanation of the identity of the sequence 
25 corresponding to the NIH GENBANK locus name in the "entry" 
column. 

Table 3 is a comparison of the top fifteen most 
abundant gene transcripts in normal monocytes and activated 
macrophage cells. 

30 Table 4 is a detailed summary of library subtraction 

analysis summary comparing the THP-l and human macrophage 
cDNA sequences. In Table 4, the same code as in Table 2 is 
used. Additional columns are for "bgfreq" (abundance 
number in the subtractant library) , "rfend" (abundance 

35 number in the target library) and "ratio" (the target 
abundance number divided by the subtractant abundance 
number) . As is clear from perusal of the table, when the 
abundance number in the subtractant library is "0", the 
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target abundance number is divided by 0.05. This is a way 
of obtaining a result (not possible dividing by 0) and 
distinguishing the result from ratios of subtractant 
numbers of l. 

5 Table 5 is the computer program, written in source 

code, for generating gene transcript subtraction profiles. 

Table 6 is a partial listing of database entries used 
in the electronic northern blot analysis as provided by the 
present invention. 

10 

4.2. BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 is a chart summarizing data collected and 
stored regarding the library construction portion of 
sequence preparation and analysis. 

15 Figure 2 is a diagram representing the sequence of 

operations performed by "abundance sort" software in a 
class of preferred embodiments of the inventive method. 

Figure 3 is a block diagram of a preferred embodiment 
of the system of the invention. 

20 Figure 4 is a more detailed block diagram of the 

bioinf ormatics process from new sequence (that has already 
been sequenced but not identified) to printout of the 
transcript imaging analysis and the provision of database 
subscriptions . 

25 5. DETAILED DESCRIPTION OF THE INVENTION 

The present invention provides a method to compare the 
relative abundance of gene transcripts in different 
biological specimens by the use of high-throughput 
sequence-specific analysis of individual RNAs or their 

30 corresponding cDNAs (or alternatively, of data representing 
other biological sequences) . This process is denoted 
herein as gene transcript imaging. The quantitative 
analysis of the relative abundance for a set of gene 
transcripts is denoted herein as "gene transcript image 

35 analysis" or "gene transcript frequency analysis". The 
present invention allows one to obtain a profile for gene 
transcription in any given population of cells or tissue 
from any type of organism. The invention can be applied to 
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obtain a profile of a specimen consisting of a single cell 
(or clones*' of a" single cell), or of many cells, or of 
tissue more complex than a single cell and containing 
multiple cell types, such as liver. 
5 The invention has significant advantages in the fields 

of diagnostics, toxicology and pharmacology, to name a few. 
A highly sophisticated diagnostic test can be performed on 
the ill patient in whom a diagnosis has not been made. A 
biological specimen consisting of the patient's fluids or 
10 tissues is obtained, and the gene transcripts are isolated 
and expanded to the extent necessary to determine their 
identity. Optionally, the gene transcripts can be 
converted to cDNA. A sampling of the gene transcripts are 
subjected to sequence-specific analysis and quantified. 
These gene transcript sequence abundances are compared 
against reference database sequence abundances including 
normal data sets for diseased and healthy patients. The 
patient has the disease (s) with which the patient's data 
set most closely correlates. 

For example, gene transcript frequency analysis can be 
used to differentiate normal cells or tissues from diseased 
cells or tissues, just as it highlights differences between 
normal monocytes and activated macrophages in Table 3. 

in toxicology, a fundamental question is which tests 
are most effective in predicting or detecting a toxic 
effect. Gene transcript imaging provides highly detailed 
information on the cell and tissue environment, some of 
which would not be obvious in conventional, less detailed 
screening methods. The gene transcript image is a more 
powerful method to predict drug toxicity and efficacy. 
Similar benefits accrue in the use of this tool in 
pharmacology. The gene transcript image can be used 
selectively to look at protein categories which are 
expected to be affected, for example, enzymes which 
35 detoxify toxins. 

In an alternative embodiment, comparative gene 
transcript frequency analysis is used to differentiate 
between cancer cells which respond to anti-cancer agents 
and those which do not respond. Examples of anti-cancer 
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agents are tamoxifen, vincristine, vinblastine, 
podophyllotoxins, etoposide, tenisposide,_cisplatin, 
biologic response modifiers such as interferon, 11-2, GM- 
CSF, enzymes, hormones and the like. This method also 
5 provides a means for sorting the gene transcripts by 
functional category. In the case of cancer cells, 
transcription factors or other essential regulatory 
molecules are very important categories to analyze across 
different libraries. 

10 in yet another embodiment, comparative gene transcript 

frequency analysis is used to differentiate between control 
liver cells and liver cells isolated from patients treated 
with experimental drugs like FIAU to distinguish between 
pathology caused by the underlying disease and that caused 

15 by the drug. 

In yet another embodiment, comparative gene transcript 
frequency analysis is used to differentiate between brain 
tissue from patients treated and untreated with lithium. 
In a further embodiment, comparative gene transcript 
20 frequency analysis is used to differentiate between 
cyclosporin and FK506-treated cells and normal cells. 

In a further embodiment, comparative gene transcript 
frequency analysis is used to differentiate between virally 
infected (including HIV-infected) human cells and 
25 uninfected human cells. Gene transcript frequency analysis 
is also used to rapidly survey gene transcripts in HIV- 
resistant, HIV-infected, and HIV-sensitive cells. 
Comparison of gene transcript abundance will indicate the 
success of treatment and/ or new avenues to study. 
30 In a further embodiment, comparative gene transcript 

frequency analysis is used to differentiate between 
bronchial lavage fluids from healthy and unhealthy patients 
with a variety of ailments. 

In a further embodiment, comparative gene transcript 
35 frequency analysis is used to differentiate between cell, 
plant, microbial and animal mutants and wild-type species. 
In addition, the transcript abundance program is adapted to 
permit the scientist to evaluate the transcription of ne 
gene in many different tissues. Such comparisons could 
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±dentify deleti °n mutants vhichjSo not produce a gene 
product and point mutants which produce a-4ess abundant or 
otherwise different message. Such mutations can affect 
basic biochemical and pharmacological processes, such as 
5 mineral nutrition and metabolism, and can be isolated by 
means known to those skilled in the art. Thus, crops with 
improved yields, pest resistance and other factors can be 
developed. 

In a further embodiment, comparative gene transcript 
LO frequency analysis is used for an interspecies comparative 
analysis which would allow for the selection of better 
Pharmacologic animal models, m this embodiment, humans 
and other animals (such as a mouse) , or their cultured 
cells are treated with a specific test agent. The relative 
5 sequence abundance of each cDNA population is determined 
" If the animal test system is a good model, homologous genes 
in the animal cDNA population should change expression 
similarly to those in human cells. if side effects are 
detected with the drug, a detailed transcript abundance 
> analysis will be performed to survey gene transcript 

changes. Models will then be evaluated by comparing basic 
physiological changes. 

in a further embodiment, comparative gene transcript 
frequency analysis is used in a clinical setting to give a 
• highly detailed gene transcript profile of a patient's 
cells or tissue (for example, a blood sample). m 
particular, gene transcript frequency analysis is used to 
give a high resolution gene expression profile of a 
diseased state or condition. 

in the preferred embodiment, the method utilizes 
high-throughput cDNA sequencing to identify specific 
transcripts of interest. The generated cDNA and deduced 
ammo acid sequences are then extensively compared with 
GENBANK and other sequence data banks as described below 
The method offers several advantages over current protein 
discovery by two-dimensional gel methods which try to 
identify individual proteins involved in a particular 
biological effect. Here, detailed comparisons of profiles 
of activated and inactive cells reveal numerous changes in 
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the expression of individual transcripts. After it is 
determined* if the sequence is an "exact" match, similar or 
a non-match, the sequence is entered into a database. 
Next, the numbers of copies of cDNA corresponding to each 
5 gene are tabulated. Although this can be done slowly and 
arduously, if at all, by human hand from a printout of all 
entries, a computer program is a useful and rapid way to 
tabulate this information. The numbers of cDNA copies 
(optionally divided by the total number of sequences in the 
data set) provides a picture of the relative abundance of 
transcripts for each corresponding gene. The list of 
represented genes can then be sorted by abundance in the 
cDNA population. A multitude of additional types of 
comparisons or dimensions are possible and are exemplified 
15 below. 

An alternate method of producing a gene transcript 
image includes the steps of obtaining a mixture of test 
mRNA and providing a representative array of unique probes 
whose sequences are complementary to at least some of the 

20 test mRNAs . Next, a fixed amount of the test mRNA is added 
to the arrayed probes. The test mRNA is incubated with the 
probes for a sufficient time to allow hybrids of the test 
mRNA and probes to form. The mRNA-probe hybrids are 
detected and the quantity determined. The hybrids are 

25 identified by their location in the probe array. The 
quantity of each hybrid is summed to give a population 
number. Each hybrid quantity is divided by the population 
number to provide a set of relative abundance data termed a 
gene transcript image analysis. 

30 6. EXAMPLES 

The examples below are provided to illustrate the 
subject invention. These examples are provided by way of 
illustration and are not included for the purpose of 
limiting the invention. 

35 6 ' 1 - TISSUE SOURCES AND CELL LTKBR 

For analysis with the computer program claimed herein, 
biological sequences can be obtained from virtually any 
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source. Most popular are tissues obtained from the human 
body. Tissues can be obtained from any organ of the body, 
any age donor, any abnormality or any immortalized cell 
line. Immortal cell lines may be preferred in some 
5 instances because of their purity of cell type; other 
tissue samples invariably include mixed cell types. A 
special technique is available to take a single cell (for 
example, a brain cell) and harness the cellular machinery 
to grow up sufficient cDNA for sequencing by the techniques 
10 and analysis described herein (cf. U.S. Patent Nos. 
5,021,335 and 5,168,038, which are incorporated by 
reference) . The examples given herein utilized the 
following immortalized cell lines: monocyte-like U-937 
cells, activated macrophage-like THP-1 cells, induced 
15 vascular endothelial cells (HUVEC cells) and mast cell-like 
HMC-1 cells. 

The U-937 cell line is a human histiocytic lymphoma 
cell line with monocyte characteristics, established from 
malignant cells obtained from the pleural effusion of a 
20 patient with diffuse histiocytic lymphoma (Sundstrom, C. 
and Nilsson, K. (1976) Int. J. cancer 17:565). u-937 is 
one of only a few human cell lines with the morphology, 
cytochemistry, surface receptors and monocyte-like 
characteristics of histiocytic cells. These cells can be 

25 induced to terminal monocytic differentiation and will 
express new cell surface molecules when activated with 
supernatants from human mixed lymphocyte cultures. Upon 
this type of in vitro activation, the cells undergo 
morphological and functional changes, including 

30 augmentation of antibody-dependent cellular cytotoxicity 

(ADCC) against erythroid and tumor target cells (one of the 
principal functions of macrophages) . Activation of U-937 
cells with phorbol 12-myristate 13 -acetate (PMA) in vitro 
stimulates the production of several compounds, including 

35 prostaglandins, leukotrienes and platelet-activating factor 
(PAF) , which are potent inflammatory mediators. Thus, U- 
937 is a cell line that is well suited for the 
identification and isolation of gene transcripts associated 
with normal monocytes. 
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The - I T? VEC . e11 line is a normal, homogeneous, well 
characterized, early passage endothelial cell culture from 
human umbilical vein (Cell Systems Corp., 12815 NE 124th 
Street, Kirkland, WA 98034) . Only gene transcripts from 
5 induced, or treated, HUVEC cells were sequenced. One batch 
of 1 X lo 8 cells was treated for 5 hours with 1 U/ml rIL-lb 
and 100 ng/ml E.coli lipopolysaccharide (LPS) endotoxin 
prior to harvesting. A separate batch of 2 X 10 8 cells was 
treated at confluence with 4 U/ml TNF and 2 U/ml 
10 interferon-gamma (IFN-gamma) prior to harvesting. 

THP-l is a human leukemic cell line with distinct 
monocytic characteristics. This cell line was derived from 
the blood of a 1-year-old boy with acute monocytic leukemia 
(Tsuchiya, S. et al. (1980) Int. J. Cancer: 171-76). The 
15 following cytological and cytochemical criteria were used 
to determine the monocytic nature of the cell line: l) the 
presence of alpha-naphthyl butyrate esterase activity which 
could be inhibited by sodium fluoride; 2) the production of 
lysozyme; 3) the phagocytosis of latex particles and 
20 sensitized SRBC (sheep red blood cells) ; and 4) the ability 
of mitomycin C-treated THP-l cells to activate T- 
lymphocytes following ConA (concanavalin A) treatment. 
Morphologically, the cytoplasm contained small azurophilic 
granules and the nucleus was indented and irregularly 
25 shaped with deep folds. The cell line had Fc and C3b 
receptors, probably functioning in phagocytosis. THP-l 
cells treated with the tumor promoter 12-o-tetradecanoyl- 
phorbol-13 acetate (TPA) stop proliferating and 
differentiate into macrophage-like cells which mimic native 
30 monocyte-derived macrophages in several respects. 

Morphologically, as the cells change shape, the nucleus 
becomes more irregular and additional phagocytic vacuoles 
appear in the cytoplasm. The differentiated THP-l cells 
also exhibit an increased adherence to tissue culture 
35 plastic. 

HMC-l cells (a human mast cell line) were established 
from the peripheral blood of a Mayo Clinic patient with 
mast cell leukemia (Leukemia Res. (1988) 12:345-55). The 
cultured cells looked similar to immature cloned murine 
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mast cells, contained histamine, and stained positively for 
chforoaceta'te esterase, amino caproate esterase, eosinophil 
major basic protein (MBP) and tryptase. The HMC-1 cells 
have, however, lost the ability to synthesize normal IgE 
5 receptors. HMC-l cells also possess a 10, -16 translocation, 
present in cells initially collected by leukophoresis from 
the patient and not an artifact of culturing. Thus, HMC-l 
cells are a good model for mast cells. 
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6 -2. CONSTRUCTION OF cDNA LIBRARIES 
For inter-library comparisons, the libraries must be 
prepared in similar manners. Certain parameters appear to 
be particularly important to control, one such parameter 
is the method of isolating mRNA. It is important to use 
the same conditions to remove DNA and heterogeneous nuclear 
15 RNA from comparison libraries, size fractionation of cDNA 
must be carefully controlled. The same vector preferably 
should be used for preparing libraries to be compared. At 
the very least, the same type of vector (e.g., 
unidirectional vector) should be used to assure a valid 
20 comparison. A unidirectional vector may be preferred in 
order to more easily analyze the output. 

It is preferred to prime only with oligo dT 
unidirectional primer in order to obtain one only clone per 
mRNA transcript when obtaining cDNAs. However, it is 
25 recognized that employing a mixture of oligo dT and random 
primers can also be advantageous because such a mixture 
results in more sequence diversity when gene discovery also 
is a goal. Similar effects can be obtained with DR2 
(Clontech) and HXLOX (US Biochemical) and also vectors from 
30 Invitrogen and Novagen. These vectors have two 

requirements. First, there must be primer sites for 
commercially available primers such as T3 or M13 reverse 
primers. Second, the vector must accept inserts up to 10 
kB. 

35 it also is important that the clones be randomly 

sampled, and that a significant population of clones is 
used. Data have been generated with 5,000 clones; however, 
if very rare genes are to be obtained and/or their relative 
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abundance determined, as many a. A 100,000 clones from a 
single library may need to be sampled. si*e fractionation 
of cDNA also must be carefully controlled. Alternately 
plaques can be selected, rather than clones. 
5 Besides the Uni-ZAP™ vector system by Stratagene 

disclosed below, it is now believed that other similarly 
unidirectional vectors also can be used. For example, it 
is believed that such vectors include but are not limited 
to DR2 (Clontech), and HXLOX (U.S. Biochemical). 
LO Preferably, the details of library construction (as 

shown in Figure 1) are collected and stored in a database 
for later retrieval relative to the sequences being 
compared. Fig. 1 shows important information regarding the 
library collaborator or cell or cDNA supplier 
pretreatment, biological source, culture, mRNA preparation 
•and cDNA construction, similarly detailed information 
about the other steps is beneficial in analyzing sequences 
and libraries in depth. 

RNA must be harvested from cells and tissue samples 
and cDNA libraries are subsequently constructed. cDNA 
libraries can be constructed according to techniques known 
xn the art. (See, for example, Maniatis, T. et al. (i 98 2) 
Molecular Cloning, cold Spring Harbor Laboratory, New 
York). cDNA libraries may also be purchased. The U-937 
cDNA library (catalog No. 937207) was obtained from 
Stratagene, inc., 11099 M. Torrey Pi nes Rd. , La Jolla CA 
92037, ' 

The THP-i cDNA library was custom constructed by 

30 ^r ta r ne fr ° m THP ~ 1 C6llS CUltUred 48 hou " 100 nm 

30 TPA and 4 hours with 1 Mg/ml LPS . The himan mast ^ 

1 CDNA library was also custom constructed by Stratagene 
from cultured HMC-i cells. The HUVEC cDNA library was 
custom constructed by Stratagene from two batches of 
induced HUVEC cells which were separately processed. 
35 Essentially, all the libraries were prepared in the 

same manner. First, poly(A + )RNA (mRNA) was purified. For 
the U-937 and HMC-1 RNA, cDNA synthesis was only primed 
with oligo dT. For the THP-i an d HUVEC RNA, cDNA synthesis 
was primed separately with both oligo dT and random 
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hexamers, and the two cDNA libraries were treated 
separately. Synthetic adaptor oligonucleotides were 
ligated onto cDNA ends enabling its insertion into the Uni- 
Zap™ vector system (Stratagene) , allowing high efficiency 
5 unidirectional (sense orientation) lambda library 

construction and the convenience of a plasmid system with 
blue-white color selection to detect clones with cDNA 
insertions. Finally, the two libraries were combined into 
a single library by mixing equal numbers of bacteriophage. 

The libraries can be screened with either DNA probes 
or antibody probes and the pBluescript® phagemid 
(Stratagene) can be rapidly excised in vivo . The phagemid 
allows the use of a plasmid system for easy insert 
characterization, sequencing, site-directed mutagenesis, 
the creation of unidirectional deletions and expression of 
fusion proteins. The custom-constructed library phage 
particles were infected into E. coll host strain XLl-Blue® 
(Stratagene) , which has a high transformation efficiency, 
increasing the probability of obtaining rare, under- 
20 represented clones in the cDNA library. 



10 



15 



25 



6 « 3 « ISOLATION OF eDNA CLONES 

The phagemid forms of individual cDNA clones were 
obtained by the in vivo excision process, in which the host 
bacterial strain was coinfected with both the lambda 
library phage and an fl helper phage. Proteins derived 
from both the library-containing phage and the helper phage 
nicked the lambda DNA, initiated new DNA synthesis from 
defined sequences on the lambda target DNA and created a 
smaller, single stranded circular phagemid DNA molecule 
that included all DNA sequences of the pBluescript® plasmid 
and the cDNA insert. The phagemid DNA was secreted from 
the cells and purified, then used to re-infect fresh host 
cells, where the double stranded phagemid DNA was produced. 
Because the phagemid carries the gene for beta-lactamase, 
35 the newly-transformed bacteria are selected on medium 
containing ampicillin. 

Phagemid DNA was purified using the Magic Minipreps™ 
DNA Purification System (Promega catalogue #A7100. Promega 
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Corp., 2800 Woods Hollow Rd. , Madison, WI 53711). This 
small-scale process provides a simple and .reliable method 
for lysing the bacterial cells and rapidly isolating 
purified phagemid DNA using a proprietary DNA-binding 
5 resin. The DNA was eluted from the purification resin 
already prepared for DNA sequencing and other analytical 
manipulations . 

Phagemid DNA was also purified using the QIAwell-8 
Plasmid Purification System from QIAGEN® DNA Purification 
10 System (QIAGEN Inc., 9259 Eton Ave., Chattsworth, CA 

91311) . This product line provides a convenient, rapid and 
reliable high-throughput method for lysing the bacterial 
cells and isolating highly purified phagemid DNA using 
QIAGEN anion-exchange resin particles with EMPORE™ membrane 
15 technology from 3M in a multiwell format. The DNA was 

eluted from the purification resin already prepared for DNA 
sequencing and other analytical manipulations. 

An alternate method of purifying phagemid has recently 
become available. It utilizes the Miniprep Kit (Catalog 
20 No. 77468, available from Advanced Genetic Technologies 
Corp., 19212 Orbit Drive, Gaithersburg, Maryland). This 
kit is in the 96-well format and provides enough reagents 
for 960 purifications. Each kit is provided with a 
recommended protocol, which has been employed except for 
25 the following changes. First, the 96 wells are each filled 
with only 1 ml of sterile terrific broth with carbenicillin 
at 25 mg/L and glycerol at 0.4%. After the wells are 
inoculated, the bacteria are cultured for 24 hours and 
lysed with 60 nl of lysis buffer. A centrifugation step 
(2900 rpm for 5 minutes) is performed before the contents 
of the block are added to the primary filter plate. The 
optional step of adding isopropanol to TRIS buffer is not 
routinely performed. After the last step in the protocol, 
samples are transferred to a Beckman 96-well block for 
35 storage. 

Another new DNA purification system is the WIZARD™ 
product line which is available from Promega (catalog No. 
A7071) and may be adaptable to the 96-well format. 
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6.4. SEQUENCING OP eDNA CLONES 
The cDNA inserts from random isolates_of the U-937 and 
THP-l libraries were sequenced in part. Methods for DNA 
sequencing are well known in the art. Conventional 
5 enzymatic methods employ DNA polymerase Klenow fragment, 
Sequenase™ or Taq polymerase to extend DNA chains from an 
oligonucleotide primer annealed to the DNA template of 
interest. Methods have been developed for the use of both 
single- and double-stranded templates. The chain 
10 termination reaction products are usually electrophoresed 
on urea-acrylamide gels and are detected either by 
autoradiography (for radionuclide-labeled precursors) or by 
fluorescence (for fluorescent-labeled precursors). Recent 
improvements in mechanized reaction preparation, sequencing 
15 and analysis using the fluorescent detection method have 
permitted expansion in the number of sequences that can be 
determined per day (such as the Applied Biosystems 373 and 
377 DNA sequencer, Catalyst 800) . Currently with the 
system as described, read lengths range from 250 to 400 
20 bases and are clone dependent. Read length also varies 
with the length of time the gel is run. In general, the 
shorter runs tend to truncate the sequence. A minimum of 
only about 25 to 50 bases is necessary to establish the 
identification and degree of homology of the sequence. 
25 Gene transcript imaging can be used with any sequence- 
specific method, including, but not limited to 
hybridization, mass spectroscopy, capillary electrophoresis 
and 505 gel electrophoresis. 



6.5. HOMOLOGY SEARCHING OP cDNA CLONE AND 
DEDUCED PROTRT N (ana Suhseouent fit-Ap e) 

Using the nucleotide sequences derived from the cDNA 

clones as query sequences (sequences of a Sequence 

Listing) , databases containing previously identified 

sequences are searched for areas of homology (similarity) . 

Examples of such databases include Genbank and EMBL. We 

next describe examples of two homology search algorithms 

that can be used, and then describe the subsequent 

computer-implemented steps to be performed in accordance 

with preferred embodiments of the invention. 
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In the following description of the computer- 
implemented steps of the invention, the word "library" 
denotes a set (or population) of biological specimen 
nucleic acid s quences. A "library" can consist of cDNA 
5 sequences, RNA sequences, or the like, which characterize a 
biological specimen. The biological specimen can consist 
of cells of a single human cell type (or can be any of the 
other above-mentioned types of specimens) . We contemplate 
that the sequences in a library have been determined so as 
10 to accurately represent or characterize a biological 

specimen (for example, they can consist of representative 
cDNA sequences from clones of RNA taken from a single human 
cell) ♦ 

In the following description of the computer- 
15 implemented steps of the invention, the expression 

"database" denotes a set of stored data which represent a 
collection of sequences, which in turn represent a 
collection of biological reference materials. For example, 
a database can consist of data representing many stored 
20 cDNA sequences which are in turn representative of human 
cells infected with various viruses, cells of humans of 
various ages, cells from different mammalian species, and 
so on. 

In preferred embodiments, the invention employs a 
25 computer programmed with software (to be described) for 
performing the following steps: 

(a) processing data indicative of a library of cDNA 
sequences (generated as a result of high- throughput cDNA 
sequencing or other method) to determine whether each 

30 sequence in the library matches a DNA sequence of a 

reference database of DNA sequences (and if so, identifying 
the reference database entry which matches the sequence and 
indicating the degree of match between the reference 
sequence and the library sequence) and assigning an 

35 identified sequence value based on the sequence annotation 
and degree of match to each of the sequences in the 
library; 

(b) for some or all entries of the database, 
tabulating the number of matching identified sequence 
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_ ^ eS in .* he librar y (Although^this can be done by human 

hand from a printout of all entries, we prefer to perform 
this step using computer software to be described below ) 
thereby generating a set of final data values or "abundance 
5 numbers"; and 

(c) if the libraries are different sizes, dividing 
each abundance number by the total number of sequences in 
the library, to obtain a relative abundance number for each 
identified sequence value (i.e., a relative abundance of 
10 each gene transcript) . 

The list of identified sequence values (or genes 

lZ r lZT ing thSret0) th6n be S ° rted b * in 
the cDNA population, a multitude of additional types of 

comparisons or dimensions are possible. 
« For example (to be described below in greater detail, 

steps (a, and (b) can be repeated for two different 
libraries (sometimes referred to as . "target" library and 
a "subtractant" library, . Then , for each identified 

20 obH" 0 ; 56116 t™ 1 ^ < * ""tic- value is 

20 obtained by dividing the abundance number (for that 

identified sequence value) for the target library, by the 
abundance number (for thai- i*^+<*' ^ 

that identified sequence value) for 
the subtractant library. 

25 Ubr jLf aCt ;/ UbtraCti0 " Carried OUt on » ulti Ple 

, V- " P ° SSible " " M the ripts from 

several Izbraries (for example, three, and then to divide 

uZ7 r thSr " °* *«" -Itlpl. libraries 

(again, for example, three) . Notation for this operation 
-y b. abbreviated as (A+B+ C, / (D+E+ F, , where the capital 

T MCh indiC " e an en "" optionally the 

abundance numbers of transcripts in the summed libraries 
«ay be divided by the total sample size before subtraction. 

Unlike standard hybridization technology which permits 
a smgle subtraction of two libraries, once one has 
* processed a set or library transcript sequences and stored 
them ln the computer, any number of subtractions can be 
performed on the library. For example, by this method, 
ratxo values can be obtained by dividing relative abundance 
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values in a first library by corresponding values in a 

second iibrary and vice versa. 

In variations on step (a) , the library consists of 
nucleotide sequences derived from cDNA clones. Examples of 
5 databases which can be searched for areas of homology 

(similarity) in step (a) include the commercially available 
databases known as Genbank (NIH) EMBL (European Molecular 
Biology Labs, Germany), and GENESEQ (Intelligenetics, 
Mountain View, California) . 
10 one homology search algorithm which can be used to 

implement step (a) is the algorithm described in the paper 
by D.J. Lipman and W.R. Pearson, entitled "Rapid and 
Sensitive Protein Similarity Searches," Science, 227:1435 
(1985). In this algorithm, the homologous regions are 
15 searched in a two-step manner. In the first step, the 

highest homologous regions are determined by calculating a 
matching score using a homology score table. The parameter 
"Ktup" is used in this step to establish the minimum window 
size to be shifted for comparing two sequences. Ktup also 
sets the number of bases that must match to extract the 
highest homologous region among the sequences. In this 
step, no insertions or deletions are applied and the 
homology is displayed as an initial (INIT) value. 

In the second step, the homologous regions are aligned 
to obtain the highest matching score by inserting a gap in 
order to add a probable deleted portion. The matching 
score obtained in the first step is recalculated using the 
homology score Table and the insertion score Table to an 
optimized (OPT) value in the final output. 
30 DNA homologies between two sequences can be examined 

graphically using the Harr method of constructing dot 
matrix homology plots (Needleman, S.B. and Wunsch, CO., 
Mom. Biol 48:443 (1970)). This method produces a 
two-dimensional plot which can be useful in determining 
35 regions of homology versus regions of repetition. 

However, in a class of preferred embodiments, step (a) 
is implemented by processing the library data in the 
commercially available computer program known as the 
INHERIT 670 Sequence Analysis System, available from 
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Applied Biosystems Inc. (Poster City, California) , 
including" "the software known as "the Factura software (also 
available from Applied Biosystems Inc.). The Factura 
program preprocesses each library sequence to "edit out" 
5 portions thereof which are not likely to be of interest, 
such as the vector used to prepare the library. Additional 
sequences which can be edited out or masked (ignored by the 
search tools) include but are not limited to the polyA tail 
and repetitive GAG and CCC sequences. A low-end search' 
10 program can be written to mask out such "low-information" 
sequences, or programs such as BLAST can ignore the low- 
information sequences. 

In the algorithm implemented by the INHERIT 670 
Sequence Analysis System, the Pattern Specification 
15 Language (developed by TRW Inc.) is used to determine 
regions of homology. "There are three parameters that 
determine how INHERIT analysis runs sequence comparisons: 
window size, window offset and error tolerance. Window 
size specifies the length of the segments into which the 
20 query sequence is subdivided. Window offset specifies 

where to start the next segment [to be compared], counting 
from the beginning of the previous segment. Error 
tolerance specifies the total number of insertions, 
deletions and/or substitutions that are tolerated over the 
25 specified word length. Error tolerance may be set to any 
integer between 0 and 6. The default settings are window 
tolerance=20, window offset=lo and error tolerance=3 . » 
INHERIT Analysis Users Manual , pp. 2-15. Version 1.0,' 
Applied Biosystems, Inc., October 1991. 

Using a combination of these three parameters, a 
database (such as a DNA database) can be searched for 
sequences containing regions of homology and the 
appropriate sequences are scored with an initial value. 
Subsequently, these homologous regions are examined using 
dot matrix homology plots to determine regions of homology 
versus regions of repetition. Smith-Waterman alignments 
can be used to display the results of the homology search. 
The INHERIT software can be executed by a Sun computer 
system programmed with the UNIX operating system. 
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Search alternatives to INHERIT include the BLAST 
program, GCG (available from the Genetics Computer Group, 
WI) and the Dasher program (Temple Smith, Boston 
University, Boston, MA) . Nucleotide sequences can be 
5 searched against Genbank, EMBL or custom databases such as 
GENESEQ (available from Intelligenetics, Mountain View, CA) 
or other databases for genes. In addition, we have 
searched some sequences against our own in-house database. 
In preferred embodiments, the transcript sequences are 
10 analyzed by the INHERIT software for best conformance with 
a reference gene transcript to assign a sequence identifier 
and assigned the degree of homology, which together are the 
identified sequence value and are input into, and further 
processed by, a Macintosh personal computer (available from 
15 Apple) programmed with an "abundance sort and subtraction 
analysis" computer program (to be described below) . 

Prior to the abundance sort and subtraction analysis 
program (also denoted as the "abundance sort" program) , 
identified sequences from the cDNA clones are assigned 
20 value (according to the parameters given above) by degree 
of match according to the following categories: "exact" 
matches (regions with a high degree of identity) , 
homologous human matches (regions of high similarity, but 
not "exact" matches) , homologous non-human matches (regions 
25 of high similarity present in species other than human) , or 
non matches (no significant regions of homology to 
previously identified nucleotide sequences stored in the 
form of the database). Alternately, the degree of match 
can be a numeric value as described below. 
30 With reference again to the step of identifying 

matches between reference sequences and database entries, 
protein and peptide sequences can be deduced from the 
nucleic acid sequences. Using the deduced polypeptide 
sequence, the match identification can be performed in a 
35 manner analogous to that done with cDNA sequences. A 

protein sequence is used as a query sequence and compared 
to the previously identified sequences contained in a 
database such as the Swiss/Prot, PIR and the NBRF Protein 
database to find homologous proteins. These proteins are 
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initially scored for homology using a homology score Table 
f(Srcutt;"S*.C. "and Dayoff, m/o? Scoring Matrices, PIR 
Report MAT - 0285 (February 1985)) resulting in an INIT 
score. The homologous regions are aligned to obtain the 
5 highest matching scores by inserting a gap which adds a 
probable deleted portion. The matching score is 
recalculated using the homology score Table and the 
insertion score Table resulting in an optimized (OPT) 
score. Even in the absence of knowledge of the proper 
10 reading frame of an isolated sequence, the above-described 
protein homology search may be performed by searching all 3 
reading frames. 

Peptide and protein sequence homologies can also be 
ascertained using the INHERIT 670 Sequence Analysis System 
15 in an analogous way to that used in DNA sequence 

homologies. Pattern Specification Language and parameter 
windows are used to search protein databases for sequences 
containing regions of homology which are scored with an 
initial value. Subsequent display in a dot-matrix homology 
20 plot shows regions of homology versus regions of 

repetition. Additional search tools that are available to 
use on pattern search databases include PLsearch Blocks 
(available from Henikoff & Henikoff , University of 
Washington, Seattle), Dasher and GCG . Pattern search 
25 databases include, but are not limited to, Protein Blocks 
(available from Henikoff & Henikoff, University of 
Washington, Seattle), Brookhaven Protein (available from 
the Brookhaven National Laboratory, Brookhaven, MA), 
PROSITE (available from Amos Bairoch, University of Geneva, 
30 Switzerland), ProDom (available from Temple Smith, Boston 
University) , and PROTEIN MOTIF FINGERPRINT (available from 
University of Leeds, United Kingdom). 

The ABI Assembler application software, part of the 
INHERIT DNA analysis system (available from Applied 
Biosystems, Inc., Foster City, CA) , can be employed to 
create and manage sequence assembly projects by assembling 
data from selected sequence fragments into a larger 
sequence. The Assembler software combines two advanced 
computer technologies which maximize the ability to 
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assemble sequenced DNA fragments into Assemblages, a 
special grouping of data where the relationships between 
sequences are shown by graphic overlap, alignment and 
statistical views. The process is based on the 
5 Meyers-Kececioglu model of fragment assembly (INHERIT™ 
Assembler User's Manual, Applied Biosystems, Inc., Foster 
City, CA) , and uses graph theory as the foundation of a 
very rigorous multiple sequence alignment engine for 
assembling DNA sequence fragments. Other assembly programs 
that can be used include MEGALIGN (available from DNASTAR 
Inc., Madison, WI) , Dasher and STADEN (available from Roger 
Staden, Cambridge, England) . 

Next, with reference to Fig. 2, we describe in more 
detail the "abundance sort" program which implements above- 
mentioned "step (b)» to tabulate the number of sequences of 
the library which match each database entry (the "abundance 
number" for each database entry) . 

Fig. 2 is a flow chart of a preferred embodiment of 
the abundance sort program. A source code listing of this 
embodiment of the abundance sort program is set forth in 
Table 5. In the Table 5 implementation, the abundance sort 
program is written using the FoxBASE programming language 
commercially available from Microsoft Corporation. 
Although FoxBASE was the program chosen for the first 
iteration of this technology, it should not be considered 
limiting. Many other programming languages, Sybase being a 
particularly desirable alternative, can also be used, as 
will be obvious to one with ordinary skill in the art. The 
subroutine names specified in Fig. 2 correspond to 
30 subroutines listed in Table 5. 

With reference again to Fig. 2, the "Identified 
Sequences" are transcript sequences representing each 
sequence of the library and a corresponding identification 
of the database entry (if any) which it matches, m other 
words, the "Identified Sequences" are transcript sequences 
representing the output of above-discussed "step (a)." 

Fig. 3 is a block diagram of a system for implementing 
the invention. The Fig. 3 system includes library 
generation unit 2 which generates a library and asserts an 
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output stream of transcript sequences indicative of the 
biological" sequences comprising the library. Programmed 
processor 4 receives the data stream output from unit 2 and 
processes this data in accordance with above-discussed 
5 "step (a)" to generate the Identified Sequences. Processor 
4 can be a processor programmed with the commercially 
available computer program known as the INHERIT 670 
Sequence Analysis System and the commercially available 
computer program known as the Factura program (both 
10 available from Applied Biosystems Inc.) and with the UNIX 
operating system. 

Still with reference to Fig. 3, the Identified 
Sequences are loaded into processor 6 which is programmed 
with the abundance sort program. Processor 6 generates the 
15 Final Transcript sequences indicated in both Figs. 2 and 3. 
Fig. 4 shows a more detailed block diagram of a planned 
relational computer system, including various searching 
techniques which can be implemented, along with an 
assortment of databases to query against. 
20 With reference to Fig. 2, the abundance sort program 

first performs an operation known as "Tempnum" on the 
Identified Sequences, to discard all of the Identified 
Sequences except those which match database entries of 
selected types. For example, the Tempnum process can 
25 select Identified Sequences which represent matches of the 
following types with database entries (see above for 
definition): "exact" matches, human "homologous" matches, 
"other species" matches representing genes present in 
species other than human) , "no" matches (no significant 
30 regions of homology with database entries representing 
previously identified nucleotide sequences), "I" matches 
(Incyte for not previously known DNA sequences) , or "X" 
matches (matches ESTs in reference database) . This 
eliminates the U, S, M, V, A, R and D sequence (see Table 1 
35 for definitions) . 

The identified sequence values selected during the 
"Tempnum" process then undergo a further selection (weeding 
out) operation known as "Tempred." This operation can, for 
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example ' _ d f scard 311 identif ied^sequence values 

representing matches with selected database entries. 

The identified sequence values selected during the 
"Tempred" process are then classified according to library 
5 during the "Tempdesig" operation. It is contemplated that' 
the "Identified Sequences" can represent sequences from a 
single library, or from two or more libraries. 

Consider first the case that the identified sequence 
values represent sequences from a single library, m this 
L0 case, all the identified sequence values determined during 
••Tempred" undergo sorting in the "Templib" operation 
further sorting in the "Libsort" operation, and finally 
additional sorting in the "Temptarsort" operation. For 
example, these three sorting operations can sort the 
identified sequences in order of decreasing "abundance 
number- (to generate a list of decreasing abundance 
numbers, each abundance number corresponding to a unique 
identified sequence entry, or several lists of decreasing 
abundance numbers, with the abundance numbers in each list 
corresponding to database entries of a selected type) with 
redundancies eliminated from each sorted list. m this 
case, the operation identified as "Cruncher" can be 
bypassed, so that the "Final Data" values are the organized 
transcript sequences produced during the "Temptarsort" 
25 operation. 

We next consider the case that the transcript 
sequences produced during the "Tempred" operation represent 
sequences from two libraries (which we will denote the 
"target" library and the "subtractant" library) . For 
example, the target library may consist of cDNA sequences 
from clones of a diseased cell, while the subtractant 
library may consist of cDNA sequences from clones of the 
diseased cell after treatment by exposure to a drug. For 
another example, the target library may consist of cDNA 
sequences from clones of a cell type from a young human, 
while the subtractant library may consist of cDNA sequences 
from clones of the same cell type from the same human at 
different ages. 
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In *: hi ? case » the "Tempdesig" operation routes all 
transcript sequences representing the target library for 
processing in accordance with "Tempi ib" (and then "Libsort" 
and "Temptarsort") , and routes all transcript sequences 
5 representing the subtractant library for processing in 
accordance with "Tempsub" (and then "Subsort" and 
"Tempsubsort"). For example, the consecutive "Tempi ib , " 
"Libsort," and "Temptarsort" sorting operations sort 
identified sequences from the target library in order of 
10 decreasing abundance number (to generate a list of 
decreasing abundance numbers, each abundance number 
corresponding to a database entry, or several lists of 
decreasing abundance numbers, with the abundance numbers in 
each list corresponding to database entries of a selected 
15 type) with redundancies eliminated from each sorted list. 
•The consecutive "Tempsub," "Subsort," and "Tempsubsort" 
sorting operations sort identified sequences from the 
subtractant library in order of decreasing abundance number 
(to generate a list of decreasing abundance numbers, each 
abundance number corresponding to a database entry, or 
several lists of decreasing abundance numbers, with the 
abundance numbers in each list corresponding to database 
entries of a selected type) with redundancies eliminated 
from each sorted list. 

The transcript sequences output from the "Temptarsort" 
operation typically represent sorted lists from which a 
histogram could be generated in which position along one 
(e.g., horizontal) axis indicates abundance number (of 
target library sequences), and position along another 
(e.g., vertical) axis indicates identified sequence value 
(e.g., human or non-human gene type). Similarly, the 
transcript sequences output from the "Tempsubsort" 
operation typically represent sorted lists from which a 
histogram could be generated in which position along one 
(e.g., horizontal) axis indicates abundance number (of 
subtractant library sequences) , and position along another 
(e.g., vertical) axis indicates identified sequence value 
(e.g., human or non-human gene type). 
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The transcript sequences (sorted lists) output from 
the Tempsubsort and Temptarsort sorting operations are 
combined during the operation identified as "Cruncher." 
The "Cruncher" process identifies pairs of corresponding 
5 target and subtract ant abundance numbers (both representing 
the same identified sequence value) , and divides one by the 
other to generate a "ratio" value for each pair of 
corresponding abundance numbers, and then sorts the ratio 
values in order of decreasing ratio value. The data output 
10 from the "Cruncher" operation (the Final Transcript 

sequence in Fig. 2) is typically a sorted list from which a 
histogram could be generated in which position along one 
axis indicates the size of a ratio of abundance numbers 
(for corresponding identified sequence values from target 
15 and subtractant libraries) and position along another axis 
indicates identified sequence value (e.g., gene type). 

Preferably, prior to obtaining a ratio between the two 
library abundance values, the Cruncher operation also 
divides each ratio value by the total number of sequences 
20 in one or both of the target and subtractant libraries. 

The resulting lists of "relative" ratio values generated by 
the Cruncher operation are useful for many medical, 
scientific, and industrial applications. Also preferably, 
the output of the Cruncher operation is a set of lists, 
25 each list representing a sequence of decreasing ratio 
values for a different selected subset (e.g. protein 
family) of database entries. 

In one example, the abundance sort program of the 
invention tabulates for a library the numbers of mRNA 
30 transcripts corresponding to each gene identified in a 

database. These numbers are divided by the total number of 
clones sampled. The results of the division reflect the 
relative abundance of the mRNA transcripts in the cell type 
or tissue from which they were obtained. Obtaining this 
35 final data set is referred to herein as "gene transcript 
image analysis." The resulting subtracted data show 
exactly what proteins and genes are upregulated and 
downregulated in highly detailed complexity. 
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6.6. HUVEC e DNA LIBBtPV 

fai> " lV 2 is ~ an abundance table listing, the various gene 
transcripts in an induced HUVEC library. The transcripts 
are listed in order of decreasing abundance. This 
5 computerized sorting simplifies analysis of the tissue and 
speeds identification of significant new proteins which are 
specific to this cell type. This type of endothelial cell 
lines tissues of the cardiovascular system, and the more 
that is known about its composition, particularly in 
response to activation, the more choices of protein targets 
become available to affect in treating disorders of this 
tissue, such as the highly prevalent atherosclerosis. 

6 - 7 * EONOCYTE-CBLL AND M& H t- CEI , Tj omra LIBRARTEfi 

Tables 3 and 4 show truncated comparisons of two 
15 libraries. in Tables 3 and 4 the "normal monocytes" are 
the HMC-l cells, and the "activated macrophages" are the 
THP-l cells pretreated with PMA and activated with LPS. 
Table 3 lists in descending order of abundance the most 
abundant gene transcripts for both cell types, with only 
20 15 gene transcripts from each cell type, this table permits 
quick, qualitative comparison of the most common 
transcripts. This abundance sort, with its convenient 
side-by-side display, provides an immediately useful 
research tool, m this example, this research tool 
25 discloses that 1) only one of the top 15 activated 
macrophage transcripts is found in the top is normal 
monocyte gene transcripts (poly a binding protein) ; and 2) 
a new gene transcript (previously unreported in other 
databases) is relatively highly represented in activated 
30 macrophages but is not similarly prominent in normal 

macrophages. Such a research tool provides researchers 
with a short-cut to new proteins, such as receptors, cell- 
surface and intracellular signalling molecules, which can 
serve as drug targets in commercial drug screening 
35 programs. Such a tool could save considerable time over 
that consumed by a hit and miss discovery program aimed at 
identifying important proteins in and around cells, because 
those proteins carrying out everyday cellular functions and 
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15 



re ? reSented . as stead y state mRNA are quickly eliminated 
from further characterization. _ 

This illustrates how the gene transcript profiles 
change with altered cellular function. Those skilled in 
5 the art know that the biochemical composition of cells also 
changes with other functional changes such as cancer, 
including cancer's various stages, and exposure to 
toxicity. A gene transcript subtraction profile such as in 
Table 3 is useful as a first screening tool for such gene 
10 expression and protein studies. 

6.8. SUBTRACTION ANALYSIS OP NORMAL MONOCYTE-CELL AND 
ACTIVATE D MONOCYTE CELL eDNA IiTBR&PTP.Q 

Once the cDNA data are in the computer, the computer 
program as disclosed in Table 5 was used to obtain ratios 
of all the gene transcripts in the two libraries discussed 
in Example 6.7, and the gene transcripts were sorted by the 
descending values of their ratios. If a gene transcript is 
not represented in one library, that gene transcript's 
abundance is unknown but appears to be less than 1. As an 
approximation ~ and to obtain a ratio, which would not be 
possible if the unrepresented gene were given an abundance 
of zero ~ genes which are represented in only one of the 
two libraries are assigned an abundance of 1/2. Using 1/2 
for unrepresented clones increases the relative importance 
of »turned-on" and "turned-of f '• genes, whose products would 
be drug candidates. The resulting print-out is called a 
subtraction table and is an extremely valuable screening 
method, as is shown by the following data. 

Table 4 is a subtraction table, in which the normal 
30 monocyte library was electronically "subtracted" from the 
activated macrophage library. This table highlights most 
effectively the changes in abundance of the gene 
transcripts by activation of macrophages. Even among the 
first 20 gene transcripts listed, there are several unknown 
35 gene transcripts. Thus, electronic subtraction is a useful 
tool with which to assist researchers in identifying much 
more quickly the basic biochemical changes between two cell 
types. Such a tool can save universities and 
pharmaceutical companies which spend billions of dollars on 
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re t earCh _- Va . 1Uab - le time and lab °ratory resources at the 
early "discovery stage and can speed up the_drug development 
cycle, which in turn permits researchers to set up drug 
screening programs much earlier. Thus, this research tool 
5 provides a way to get new drugs to the public faster and 
more economically. 

Also, such a subtraction table can be obtained for 
patient diagnosis. An individual patient sample (such as 
monocytes obtained from a biopsy or blood sample) can be 
10 compared with data provided herein to diagnose conditions 
associated with macrophage activation. 

Table 4 uncovered many new gene transcripts (labeled 
Incyte clones) . Note that many genes are turned on in the 
activated macrophage (i.e., the monocyte had a 0 in the 
15 bgfreq column) . This screening method is superior to other 
screening techniques, such as the western blot, which are 
incapable of uncovering such a multitude of discrete new 
gene transcripts. 

The subtraction-screening technique has also uncovered 
a high number of cancer gene transcripts (oncogenes rho, 
ETS2, rab-2 ras, YPTl-related, and acute myeloid leukemia 
mRNA) in the activated macrophage. These transcripts may 
be attributed to the use of immortalized cell lines and are 
inherently interesting for that reason. This screening 
technique offers a detailed picture of upregulated 
transcripts including oncogenes, which helps explain why 
anti-cancer drugs interfere with the patient's immunity 
mediated by activated macrophages. Armed with knowledge 
gained from this screening method, those skilled in the art 
can set up more targeted, more effective drug screening 
programs to identify drugs which are differentially 
effective against 1) both relevant cancers and activated 
macrophage conditions with the same gene transcript 
profile; 2) cancer alone; and 3) activated macrophage 
35 conditions. 

Smooth muscle senescent protein (22 kd) was 
upregulated in the activated macrophage, which indicates 
that it is a candidate to block in controlling 
inflammation. 
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6.9. SUBTRACTION ANALYSIS OP NORMAL LIVER CELLS AND 
" HEPATITIS INFECTED LIVER CELL eDNA LIBRARIES 

In this example, rats are exposed to~hepatitis virus ' 

and maintained in the colony until they show definite signs 

5 of hepatitis. Of the rats diagnosed with hepatitis, one 

half of the rats are treated with a new anti-hepatitis 

agent (AHA) . Liver samples are obtained from all rats 

before exposure to the hepatitis virus and at the end of 

AHA treatment or no treatment. In addition, liver samples 

10 can be obtained from rats with hepatitis just prior to AHA 

treatment . 

The liver tissue is treated as described in Examples 
6.2 and 6.3 to obtain mRNA and subsequently to sequence 
cDNA. The cDNA from each sample are processed and analyzed 

15 for abundance according to the computer program in Table 5. 
The resulting gene transcript images of the cDNA provide 
detailed pictures of the baseline (control) for each animal 
and of the infected and/or treated state of the animals. 
cDNA data for a group of samples can be combined into a 

20 group summary gene transcript profile for all control 
samples, all samples from infected rats and all samples 
from AHA- treated rats. 

Subtractions are performed between appropriate 
individual libraries and the grouped libraries. For 

25 individual animals, control and post-study samples can be 
subtracted. Also, if samples are obtained before and after 
AHA treatment, that data from individual animals and 
treatment groups can be subtracted, in addition, the data 
for all control samples can be pooled and averaged. The 

30 control average can be subtracted from averages of both 
post-study AHA and post-study non-AHA cDNA samples. If 
pre- and post-treatment samples are available, pre- and 
post-treatment samples can be compared individually (or 
electronically averaged) and subtracted. 

35 These subtraction tables are used in two general ways. 

First, the differences are analyzed for gene transcripts 
which are associated with continuing hepatic deterioration 
or healing. The subtraction tables are tools to isolate 
the effects of the drug treatment from the underlying basic 

40 pathology of hepatitis. Because hepatitis affects many 

37 
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parameters, additional liver toxicity has been difficult to 
- detect wl^h' only blood tests for the usual^ enzymes . Th 

gene transcript profile and subtraction provides a much 
more complex biochemical picture which researchers have 
5 needed to analyze such difficult problems. 

Second, the subtraction tables provide a tool for 
identifying clinical markers, individual proteins or other 
biochemical determinants which are used to predict and/or 
evaluate a clinical endpoint, such as disease, improvement 
10 due to the drug, and even additional pathology due to the 
drug. The subtraction tables specifically highlight genes 
which are turned on or off. Thus, the subtraction tables 
provide a first screen for a set of gene transcript 
candidates for use as clinical markers. Subsequently, 
15 electronic subtractions of additional cell and tissue 

libraries reveal which of the potential markers are in fact 
found in different cell and tissue libraries. Candidate 
gene transcripts found in additional libraries are removed 
from the set of potential clinical markers. Then, tests of 
20 blood or other relevant samples which are known to lack and 
have the relevant condition are compared to validate the 
selection of the clinical marker. in this method, the 
particular physiologic function of the protein transcript 
need not be determined to qualify the gene transcript as a 
25 clinical marker. 



30 
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6.10. ELECTRONIC NORTHERN BLOT 
One limitation of electronic subtraction is that it is 
difficult to compare more than a pair of images at once. 
Once particular individual gene products are identified as 
relevant to further study (via electronic subtraction or 
other methods) , it is useful to study the expression of 
single genes in a multitude of different tissues, in the 
lab, the technique of "Northern" blot hybridization is used 
for this purpose. In this technique, a single cDNA, or a 
probe corresponding thereto, is labeled and then hybridized 
against a blot containing RNA samples prepared from a 
multitude of tissues or cell types. Upon autoradiography, 
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the . patt ®^. of .expression of that particular gene, one at a 

time, can be guantitated in all the included samples. 

In contrast, a further embodiment of this invention is 
the computerized form of this process, termed here 
5 "electronic northern blot." In this variation, a single 
gene is queried for expression against a multitude of 
prepared and sequenced libraries present within the 
database. In this way, the pattern of expression of any 
single candidate gene can be examined instantaneously and 
10 effortlessly. More candidate genes can thus be scanned, 
leading to more frequent and fruitfully relevant 
discoveries. The computer program included as Table 5 
includes a program for performing this function, and Table 
6 is a partial listing of entries of the database used in 
15 the electronic northern blot analysis. 

PHASE I CLINICAL TRIALS 

Based on the establishment of safety and effectiveness 
in the above animal tests, Phase I clinical tests are 
undertaken. Normal patients are subjected to the usual 

20 preliminary clinical laboratory tests, m addition, 
appropriate specimens are taken and subjected to gene 
transcript analysis. Additional patient specimens are 
taken at predetermined intervals during the test. The 
specimens are subjected to gene transcript analysis as 

25 described above. In addition, the gene transcript changes 
noted in the earlier rat toxicity study are carefully 
evaluated as clinical markers in the followed patients. 
Changes in the gene transcript analyses are evaluated as 
indicators of toxicity by correlation with clinical signs 

30 and symptoms and other laboratory results. In addition, 
subtraction is performed on individual patient specimens 
and on averaged patient specimens. The subtraction 
analysis highlights any toxicological changes in the 
treated patients. This is a highly refined determinant of 

35 toxicity. The subtraction method also annotates clinical 
markers. Further subgroups can be analyzed by subtraction 
analysis, including, for example, 1) segregation by 
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occurrence and type of adverse effect; and 2) segregation 
by -dosage". ~* ~ ' 

6.12. GENE TRANSCRIPT IMAGING ANALYSIS IN CLINICAL STUDIES 
A gene transcript imaging analysis (or multiple gene 
5 transcript imaging analyses) is a useful tool in other 
clinical studies. For example, the differences in gene 
transcript imaging analyses before and after treatment can 
be assessed for patients on placebo and drug treatment. 
This method also effectively screens for clinical markers 
10 to follow in clinical use of the drug. 

6 * 13, COMPARATIVE GENS TRANSCRIPT ANALYSIS BETWEEN SPECIES 

The subtraction method can be used to screen cDNA 
libraries from diverse sources. For example, the same cell 
types from different species can be compared by gene 
15 transcript analysis to screen for specific differences, 
such as in detoxification enzyme systems. Such testing 
aids in the selection and validation of an animal model for 
the commercial purpose of drug screening or toxicological 
testing of drugs intended for human or animal use. When 
the comparison between animals of different species is 
shown in columns for each species, we refer to this as an 
interspecies comparison, or zoo blot. 

Embodiments of this invention may employ databases 
such as those written using the FoxBASE programming 
language commercially available from Microsoft Corporation. 
Other embodiments of the invention employ other databases, 
such as a random peptide database, a polymer database, a 
synthetic oligomer database, or a oligonucleotide database 
of the type described in U.S. Patent 5,270,170, issued 
30 December 14, 1993 to Cull, et al., pct International 

Application Publication No. WO 9322684, published November 
11, 1993, PCT International Application Publication No. WO 
9306121, published April 1, 1993, or PCT International 
Application Publication No. WO 9119818, published December 
35 26, 1991. These four references (whose text is 

incorporated herein by reference) include teaching which 
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may be applied in implementing such other embodiments of 

the present invention. 

All references referred to in the preceding text are 
hereby expressly incorporated by reference herein. 
5 Various modifications and variations of the described 

method and system of the invention will be apparent to 
those skilled in the art without departing from the scope 
and spirit of the invention. Although the invention has 
been described in connection with specific preferred 
10 embodiments, it should be understood that the invention as 
claimed should not be unduly limited to such specific 
embodiments. 



41 



WO 95/20681 



PCT/US95/01160 



c 
o 

jj « 

o — ' 

c 

3 
to 



C* 

c c 

03 CU 
01 JJ 

CU o 
U U 

o a 
u 

a «-h 

CO 

C E 

rH *H O 

cn a m 
c jj o 

*0 O X> 
U U -H 

H cu o; 



c 
o 



JJ 

to 



c 
JJ 

a jj 
c 

oi a) 
C E 

•H 0) 

T3 rH 
C 0) 

-H 

- -° ^ 

OJ (0 

C cu u 
to O > 

o 

o a ■ 

c 

O CD > 



0) 

JJ 
0) to 
cn i-i 
co cu 
jj n 

(0 

JZ c t 

a a> to 

to cn c 

O -H *H 
£ U 0) 

a c *j 

V. (0 p 

03 U Q, 
CD O 

E tft 



C U 

0 O 

JJ Q, 

a o 
u 

u cu 
u u 

01 ^ 

c cu 

(0 <h 

U 3 

JJ u 

\ cu 



CU 



O 

a 



d a ■ n 



*3 

a C 
< m 
o 



CO 

w 
o 

JJ 
u 

tn,JJ 
cn g ^ w 

*o <u _5 to 
c u *o 

jj *w«*. ta 

' Ol 
< 3 tO -H 
S W CJ iJ 



4) 
JJ 
O 

a 

0)' 
CO 

c 
o 
a 

CO 

cu 
u 

CO 0) 

cn 



CU 
M 



Q CO U CO CO tt) 

d n 
x w 



c 

* ° e 

"* jj CO 

(0 E — i 

rH CO <~t 

>, *w O 

U U H £ 

O O _ O co 
J £ £ JO JJ E 

a co co cu oi 

jD co -h JJ E *h 
-w O (0 rH 
C £ £ O £ T3 O 

•h c a xj -h jq 

CU «w CO *D U CO i-i 

jj^cujj-h to jj co 
O cu" > cu u cu Jj 
vin^EcooE? c 
a ro JJ *h jj 3 

OCUCOUOCU'OUWO 

^ o-h cn u a £ ^ 

CUU XSED'HJJJJC 

^aow<2JwO 3 

DPOQIIIDUDB 



1 



cu 
E 













U 


rH 














c 


cu 


CO 


CO 












L> 


o 


u 


rH 


-«H 












-H 


JJ 


ro 


3 


U 












E 


cu 


«*h 


(H 


TJ 








c 




CO 


l-l 


U 


rH 


C 


•a 






0 


U 


CO 


cu 


3 


CU 


o 


CU 


c 




iH 


(0 


l-t 




0) 


u 


x: 


jj 


3 




JJ 


CU 


a 


01 




ro 


u 


cu 


o 


u 


(0 


r— C 


o 


O 


rH 


U 


o 


u 


c 


cu 


H 


U 


JJ 


jj 


i-H 


JJ 


JJ 


u 


J* 


x: 


-rl _ 




>i 


>i 


CU 


c 


fH 


cu 


c 


jj 


H N 




CJ 


cj 


u 


M 


s 


CO 


D 


o 


ffl 




















U 


tl 


B 


D 


n 


0 


a 


0 




a 


0 
»4 


SS 


a 




Ed 




s 


CO 


D 


X 



CO 



m 

3 H 

JJ w 

rd 
jj 

to 



JJ 01 

cn «H 

cu co 

n >. 

JJ (O 

c c 

-h ro 

JJ >, 

c u 

cu co 

M E 

U f-l 

D ^ 

u a 



o o u o 

2 Q ^ tit 



CU 

o 

C 
CU 
D 

cr 

co cu 

>! co 



ro x: 
c jj 
ro cn 
c 

>, cu 

U rH 

ro 

E -i 



JJ 

cn 

c c 

u cu 

CU rH 

JJ «-* 

U rH 

0 3 

C *M 

CU c 
3 

01 ro 
0) JJ 
*H X) 

H O 



o. rH cm m . m vd 



U 

•H 

**H 
•H 
U 

CU 

a 

CO 







o 








CU 








U-l ^ 








CO 




c 




U CO 




o 




CU -W 


c 


-H 




a jj 


3 


JJ 




CO \ 


o 


3 




1 rH 


c 


JO 


"ST 


C rH 




•H 




O CU 


c 


r) 




2 CJ 


D 


4-> 








Q 




n n 


I) 


-H 








Q 




U cu 


D 



CU 

c c 

— * ro -h 

a to E cu cn cn > 

a) 3 a *h o o 

-H X < CU Q CO 

o 

fl) I) I D D U 

W X < & Q > 



CU 
JJ 
CO 

CMC 
ro X3 ro 
Jh C *n cu O 
JJ CU CU X) JJ N 

•w cujjjv;-hwiO-^ 

XI Dl CO U £ 0) jj cn 
X)JJ3E-^a>OC 
tO(OOrojCECU3 



a 

c 
o 

-rl 

JJ — 
rd Q 
c ^ 
D) 

a 
o 
a 



a 









cu 




< 








cn 




c 








rH 




cu 




cu 




a 




CO 








O) 


cu 






•H 


cn 


o 






i— i 


CU 


rH rH 


u 


3 


cu 




Cn Xj 


> 


C C 


TD 


O 


a jc 


c 


ro 


•*H 


o o 


C 


Ol 


CO 


CJ 


-H 


•o 


JJ 




0 


o 




JJ 


T> 


CO 


•H 


< u 


x: 


rH 


rt 


ro 


O 


CU 


JJ 


t o 


o 


O 


CU 


E 


U 


u 


CU 


>, JJ 


0 


£ 


x: 




c 


c 


a rH u 


JJ 


O 


jj 


O 


o 


o 


CU 


O cu 


•rl 


X 


o 




z 


z 


cc 


a. > 




0 


0 


0 


a 


n 


0 


n a 


0 


X 


o 


2 


o 


Cj 


a: 


< > 





cu 
c 
o 



cu 
jj 

o x: 

c u 

M JJ 

to 

JC E 

a u 

•H n h 



A 2 



M — cn 



{0 
X3 



CU 

U *D 

C -H 

h O « Q O 
r> i u cu cn c 
CJ CU > rH c iJJ cu 

S x b a 3 *d 
=>2hxwjh< 

o o n d n. HMD 

rjSHXw,j>.< 



SUBSTITUTE SHEET (RULE 26) 



WO 95/20681 



PCT/US95/01160 



TABLE 2 



Clone numbers 15000 through 20000 

Libraries: HUVEC 

Arranged by ABUNDANCE 

Total clones analyzed: 5000 

319 genes, for a total of 1713 Clones 





number 


N 


c entry 


1 


15365 


67 


HSRPL41 


2 


15004 


65 


NCY015004 


3 


15638 


63 


NCY015638 


4 


15390 


50 


NCY015390 


5 


15193 


47 


HSFIB1 


6 


15220 


47 


RRRPL9 


7 


15280 


47 


NCY015280 


8 


15583 


33 


M62060 


9 


15662 


31 


HSACTCGR 


10 


15026 


29 


NCY015026 


11 


15279 


24 


HSEF1AR 


12 


15027 


23 


NCY015027 


13 


15033 


20 


NCY015033 


14 


15198 


20 


NCY015198 


15 


15809 


20 


HSCOLL1 


16 


15221 


19 


NCY015221 


17 


15263 


19 


NCY015263 


18 


15290 


19 


NCY015290 


19 


15350 


18 


NCYO15350 


20 


15030 


17 


NCY015030 


21 


15234 


17 


NCY015234 


22 


15459 


16 


NCY015459 


23 


15353 


15 


NCY015353 


24 


15378 


15 


S76965 


25 


15255 


14 


HUMTHYB4 


26 


15401 


14 


HSLIPCR 


27 


15425 


14 


HSPOLYAB 


28 


18212 


14 


HUMTHYMA 


29 


18216 


14 


HSMRP1 


30 


15189 


13 


HS18D 


31 


15031 


12 


HUMFKBP 


32 


15306 


12 


HSH2AZ 


33 


15621 


12 


HUMLEC 


34 


15789 


11 


NCY015789 


35 


16578 


11 


HSRPS11 


36 


16632 


11 


M61984 


37 


18314 


11 


NCY018314 


38 


15367 


10 


NCY015367 


39 


15415 


10 


HSIFNIN1 


40 


15633 


10 


HSLDHAR 


41 


15813 


10 


CHKNMHCB 


42 


18210 


10 


NCY018210 


43 


18233 


10 


HSRPII140 


44 


18996 


10 


NCY018996 


45 


15088 


9 


HUMFERL 


46 


15714 


9 


NCY015714 


47 


15720 


9 


NCY015720 


48 


15863 


9 


NCY015863 


49 


16121 


9 


HSET 


50 


18252 


9 


NCY018252 


51 


15351 


8 


HUMALBP 


52 


15370 


8 


NCY015370 



s descriptor 

Riboptn L41 
INCYTE 015004 
INCYTE 015638 
INCYTE 015390 
Fibronectin 
R Riboptn L9 

INCYTE 015280 

EST HHCH09 (IGR) 

Actin, gamma . 

INCYTE 015026 

Elf 1-alpha 

INCYTE 015027 

INCYTE 015033 

INCYTE 015198 

Collagenase 

INCYTE 015221 

INCYTE 015263 

INCYTE 015290 

INCYTE 015350 

INCYTE 015030 

INCYTE 015234 

INCYTE 015459 

INCYTE 015353 

Ptn kinase inhib 

Thymosin beta-4 

Lipocortin I 

Poly-A bp 

Thymosin, alpha 

Motility relat ptn; MRP-l;CD-9 

Interferon indue ptn 1-8D 

FK506 bp 

Histone H2A 

Lectin, B-galbp, 14kDa 
INCYTE 015789 
Riboptn Sll 
EST HHCA13 (IGR) 
INCYTE 018314 
INCYTE 015367 
interferon indue mRNA 
Lactate dehydrogenase 
C Myosin heavy chain B 
INCYTE 018210 
RNA polymerase II 
INCYTE 018996 
Ferritin, light chain 
INCYTE 015714 
INCYTE 015720 
INCYTE 015863 
Endothelin 
INCYTE 018252 
Lipid bp, adipocyte 
INCYTE 015370 
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TABLE 2 Cort+t 





number 


N 


c entry 


53 


15670 


8 


BTCIASHI 


54 


15795 


8 


NCY015795 


55 


16245 


8 


NCY016245 


56 


18262 


8 


NCY018262 


57 


18321 


8 


HSRPL17 


58 


15126 


7 


XLRPL1RRF 


59 


15133 


7 


HSACO 7 


60 


15245 


7 




61 


15288 


7 


NCY01528A 

AIWA WAW&OO 


62 


15294 


7 


HSGAPDR 


63 


15442 


7 


HOMT AMR 


64 


15485 


7 


HSNGMRNA 


65 


16646 


7 


NCY0166J6 

i' w i w A 0 0 *r u 


66 


18003 


7 


HUMPATA 


67 


15032 


6 




68 


15267 


6 


HSRPS8 


69 


15295 


6 


NCY015295 


70 


15458 


6 




71 


15832 


6 




72 


15928 


5 


HTTMADn T 


73 


16598 


c 

o 


UTTMTDDkl/tn 


74 


18218 


D 


NCiUlo218 


75 


18499 


O 


li5PZ7 


76 


18963 


c 
o 


M^Vrtl OOC1 


77 


18997 


c 
o 


NL.iUJ.o997 


78 


15432 


c 


ri SAG ALAR 


79 


15475 


c 


NCXU15475 


80 


15721 


C 


NUI U15721 


81 


15865 


5 


Mpvn i c o c c 
WClUlDOOb 


82 


16270 


5 


Wt-X U /U 


83 


16886 


5 


iilrX UIOOOD 


84 


18500 


5 




85 


18503 


5 




86 


19672 


5 


RRRPL34 
******* aj ^ 


87 


15086 


4 


XLRPL1AR 


88 


15113 


4 


HUMIFNWRS 


89 


15242 


4 


NCY015242 


90 


15249 


4 


NCY015249 


91 


15377 


4 


NCY015377 


92 


15407 


4 


NCY015407 


93 


15473 


4 


NCY015473 


94 


15588 


4 


HSRPS12 


95 


15684 


4 


HSEF1G 


96 


15782 


4 


NCY015782 


97 


15916 


4 


HSRPS18 


98 


15930 


4 


NCY015930 


99 


16108 


4 


NCY016108 


100 


16133 


4 


NCY016133 



R 
R 



R 
F 



descriptor 

NADH-ubiq oxidoreductase 

INCYTE 015795 

INCYTE 016245 

INCYTE 018262 

Riboptn L17 

Riboptn LI 

Act in, beta 

INCYTE 015245 

INCYTE 015288 

G-3-PD 

Laminin receptor, 54kDa 
Uracil DNA glycosylase 
INCYTE 016646 
Plsmnogen activ gene 
Ubiguitin 
Riboptn S8 
INCYTE 015295 
Riboptn S10 

UDP-galactose epimerase 
Apolipoptn J 
Tubulin, beta 
INCYTE 018218 
Hydrophobic ptn p27 
INCYTE 018963 
INCYTE 018997 
Galactosidase A, alpha 
INCYTE 015475 
015721 
015865 
016270 
016886 
018500 
018503 



INCYTE 
INCYTE 
INCYTE 
INCYTE 
INCYTE 
INCYTE 
Riboptn L34 
Riboptn Lla 
tRNA synthetase, 
INCYTE 015242 
INCYTE 015249 
INCYTE 015377 
INCYTE 015407 
INCYTE 015473 
Riboptn S12 
Elf 1 -gamma 
INCYTE 015782 
Riboptn S18 
INCYTE 015930 
INCYTE 016108 
INCYTE 016133 
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TABLE 4 



Libraries: THP-1 
Subtracting: HMC 
Sorted by ABUNDANCE 
Total clones analyzed: 



7375 



1057 genes, for a total of 2151 clones 
number entry s descriptor 



10022 

10036 

10089 

10060 

10003 

10689 

11050 

10937 

10176 

10886 

10186 

10967 

11353 

10298 

10215 

10276 

10488 

11138 

10037 

10840 

10672 

12837 

10001 

10005 

10294 

10297 

10403 

10699 

10966 

12092 

12549 

10691 

12106 

10194 

10479 

10031 

10203 

10288 

10372 

10471 

10484 

10859 

10890 

11511 

11868 

12820 

10133 

10516 

11063 

11140 

10788 

10033 

10035 

10084 

10236 

10383 



HUMIL1 IL 1-beta 

HSMDNCF IL-8 

HSLAG1CDN Lymphocyte activ gene 

HUMTCSM RANTES 

HUMMIP1A MIP-1 

HSOP Osteopontin 

NCY011050 INCYTE 011050 

HSTNFR TNF-alpha 

HSSOD Superoxide dismutase 

HSCDW40 B-cell activ, NGF-relat 

HUMAPR Early resp PMA- indue 

HUMGDN PN-1, glial-deriv 

NCY011353 INCYTE 011353 

NCY010298 INCYTE 010298 

HUM 4 COLA Collagenase, type IV 

NCY010276 INCYTE 010276 

NCY010488 INCYTE 010488 

NCY011138 INCYTE 011138 

HUMCAPPRO Adenylate cyclase 

HUMADCY Adenylate cyclase 

HSCD44E Cell adhesion glptn 

HUMCYCLOX Cyclooxygenase-2 

NCY010001 INCYTE 010001 

NCY010005 INCYTE 010005 

NCY010294 INCYTE 010294 

NCY010297 INCYTE 010297 

NCY010403 INCYTE 010403 

NCY010699 INCYTE 010699 

NCY010966 INCYTE 010966 

NCY012092 INCYTE 012092 

HSRHOB Oncogene rho 

HUMARF1BA ADP-ribosylation fctr 

HSADSS Adenylosuccinate synthetase 

HSCATHL Cathepsin L 

CLMCYCA I Cyclin A 

NCY010031 INCYTE 010031 

NCY010203 INCYTE 010203 

NCY010288 INCYTE 010288 

NCY010372 INCYTE 010372 

NCY010471 INCYTE 010471 

NCY010484 INCYTE 010484 

NCY010859 INCYTE 010859 

NCY010890 INCYTE 010890 

NCY011511 INCYTE 011511 

NCY011868 INCYTE 011868 

NCY012820 INCYTE 012820 

HSI1RAP IL-1 antagonist 

HUMP2A Phosphatase, regul 2A 

HUMB94 TNF- indue response 

HSHB15RNA HB15 gene; new Ig 

NCY001713 INCYTE 001713 

NCY010033 INCYTE 010033 

NCY010035 INCYTE 010035 

NCY010084 INCYTE 010084 

NCY010236 INCYTE 010236 

NCY010383 INCYTE 010383 
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0 


4 


fi OOO 

0 . UUU 


0 


4 


a 000 

0 * UUU 


0 


4 


8.000 


0 


4 


8.000 


0 


4 


8.000 


0 


4 


8.000 


0 


4 


8.000 


0 


4 


8.000 


0 


4 


8.000 


0 


4 


8.000 


0 


4 


8.000 


0 


4 


8.000 


0 


4 


8.000 


0 


4 


8.000 


0 


4 


8.000 


0 


4 


8.000 


0 


4 


8.000 


0 


3 


6.000 


0 


3 


6.000 


0 


3 


6.000 


0 


3 


6.000 


0 


3 


6.000 


0 


3 


6.000 


0 


3 


6.000 
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TABLE 4 Con'f* 



number 


entry 


s descriptor 


10450 


NCY010450 


INCYTE 


010450 


10470 


NCY010470 


INCYTE 


010470 


10504 


NCY01O5OA 


INCYTE 


010504 


10507 


NCY0105Q7 


INCYTE 


010507 


10598 


NCYOlO^Qfi 


INCYTE 


010598 


10779 


NCYOI 077Q 


INCYTE 


010779 


10909 


NCYOI OQOQ 


INCYTE 


010909 


10976 




INCYTE 


010976 


10985 


NCYOI OQflR 

i*k*X 


INCYTE 


010985 


11052 


NCYOI 1 GKO 


INCYTE 


011052 


11068 


NCYOI 1 Ofift 


INCYTE 


011068 


11134 


NCYOI 11 34 

41 WA V/ X X X W ** 


INCYTE 


011134 


11136 


NCY0111 3fi 


INCYTE 


011136 


11191 


NCYOI 1 1 Q1 


INCYTE 


011191 


11219 


NCY01 12 19 


INCYTE 


011219 


11386 


NCY011386 


INCYTE 


011386 


11403 


NCY011403 


INCYTE 


011403 


11460 


NCY011460 


INCYTE 


011460 


11618 


NCY011618 


INCYTE 


011618 


11686 


NCY011686 


INCYTE 


011686 


12021 


NCY012021 


INCYTE 


012021 


12025 


NCY012025 


INCYTE 


012025 


12320 


NCY012320 


INCYTE 


012320 


12330 


NCY012330 


INCYTE 


012330 


12853 


NCY012853 


INCYTE 


012853 


14386 


NCY014386 


INCYTE 


014386 


14391 


NCY014391 


INCYTE 


014391 



bgfreq rfend ratio 



0 


3 


6.000 


0 


3 


6.000 


0 


3 


6.000 


0 


3 


6.000 


0 


3 


6.000 


0 


3 


6.000 


0 


3 


6.000 


0 


3 


6.000 


0 


3 


6.000 


0 


3 


6.000 


0 


3 


6.000 


0 


3 


6.000 


0 


3 


6.000 


0 


3 


6.000 


0 


3 


6.000 


0 


3 


6.000 


0 


3 


6.000 


0 


3 


6.000 


0 


3 


6.000 


0 


3 


6.000 


0 


3 


6.000 


0 


3 


6.000 


0 


3 


6.000 


0 


3 


6.000 


0 


3 


6.000 


0 


3 


6.000 


0 


3 


6.000 
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TABLE $ 



9 Master menu for SUBTRACTION output 

SET TA I3C O FT 

SET SAFETY OFF 

SET EXACT ON 

SET TYFEAHEAD TO 0 

CLEAR ' 

SET DEVICE TO SCREEN 

USB-"SmartGuy!Fo*HASE+/Maesfox files i Clones. dbf* 
QO TOP * 

STORE NUMBER TO INITIATE 

CO BOTTO M 

STORE NUMBER TO 'TERMDIATE 
STORE » •" TO Targetl 

STORE! 1 1 TO Target2 

STORE 1 'TO Target3 

STORE. 1 • TO Object 1 

STORE 1 ' 1 TO Gbject2 

STORE 1 1 ^0 Object3 

STORE 0 TO ANAL 
STOR E 0 TO EM ATCH 
STgR S 0 TO HMATCH 
STORE 0 TO CKATCH 
STORE 0 TO 2MATCH 
SUUKB 0 TO OTP 
STORE 1 TO BAH/ 
DO while .T. 

* "Program. 1 'Subtraction 2.£nfc 
Date..,.s, 10/11/ 94 

■* Vers ion r 1 Fox8ASE+/Mac, revision 1.10 

* Notes....: Format file Subtraction 2 
.* 

SCREQ7 1 TYPE 0 HEADING "Screen l a AT 40,2 SIZE 286,492 PIXELS FONT "Geneva"^ COLOR 0,0.0* 
8 FDC2LS 75,120 TO 178,241 STYLE 3871 COLOR 0,0,-1,24610.-1,6947 

6 PIXELS 27,134 SWT 'Subtraction Menu" STYLE 65536 FONT f Ceneva\274 COLOR 0,0,-1,-1,-1,-1 
0 PIXELS. 117, 126 GET MITCH STYLE 65536 FONT "Chicago* ;12 PICTURE "8*C Exact ' SIZE'lS^ 'CO 
8 'PIXELS 135,.126 GET HMATCH -STYLE 65536 PONT 'Chicago ",12 .PICTURE 'G*C Homologous 1 SIZE 15,1 
8 PIXELS 153,126 GET GMATCH STYLE 65536 FONT # Chicago", 12 PICTURE '8»C Other epc 1 SIZE 15,84 

5 PIXELS 90,152 SAY ■Matehest". STYLE 6553S FONT a CenevaM2 COLOR 0,0,-1,-1,-1,-1 

8 PIXELS 171,126 GET Imatch STYLE 65536 FONT "Cfcicago , .12 PICTURE ^•C Tncyte' SIZE -15,65 CO 
8 PIXELS 252,137 GET initiate STYLE 0 FONT "Geneva", 12 SIZE 15,70 COLOR 0,0, -1, -1,-1, -1 
a PIXELS 252,236 GET terminate STYLE 0 FONT •Geneva" ,12 SIZE 15,70 COLOR 0,0,-1,-1,-1,-1 
8 PIXELS 252,35 SAY "Include clones " STYLE 65536 FCNT "Geneva", 12 COLOR 0,0,-1,-1,-1,-1 
Q PIXELS- 252,215 BAY '->" STYLE 65536 FONT "Geneva", 14 COLOR 0,0,-1,-1,-1,-1 . 

6 PIXELS -198,126 GET PTF STYLE 65536 FONT "Chicago", 12 PI CT URE "8*C -Print to file" SIZE 15',9 
8'PXXELS 90,9 TO 1$1,109 STXl£ 3871 COLOR 0,0,-1,-25600,-1,-1 

0 PIXELS 90,288 TO'181,397 STYLE 3871 COLOR 0,0,-1,-25600,-1,-1 

8 PIXELS 81,296 SAT 'Background:" STYLE 65536 FCOT "Geneva", 270 COLOR 0,0,-1,-1,-1,-1 
0 PIXELS 45,135 GET ANAL STVLE 65536 FONT "Chicago" ,.12 PICTURE ••♦R Overall; Function" SIZE 4 
8 PIXELS 81,26 SAY "Target:" STYLE 65536 FONT "Geneva",270 COLOR 0,0,-1,-1,-1,-1 
8 PIXELS 108,20 GET target! STYLE 0 PCMT "Geneve "*, 9 SIZE 12,79 COLOR 0,0,-1,-1,-1,-1 
■8 PIXELS 135,20 GET target2 STYLE 0 FCNT "Gen<?va",9 SIZE 12,79 COLOR 0,0,-1,-1,-1,-1 
.8 PIXEL S 162,20 GET target3 STYLE 0 FCNT "Geneva 1 , 9 SIZE 12,79 COLOR 0,0, -1,-1,-1, -1 
8 PIXELS 108,299 GET objectl STYLE 0 FONT 'Geneva*, 9 SIZE 12,79 COLOR 0,0,-1,-1,-1,-1 
8 PIXELS 135,299 GET object2 STYLE 0. FCNT "Geneva", 9 SIZE 12,79 COLOR 0,0,-1,-1,-1,-1 
8 PIXELS 162,299 GET cbject3 STYLE 0 FCNT "Geneva", 9 SIZE 12,79 COLOR 0, 0,-1,-1,-1, -1 
'8 PIXELS 276,324'GET Bail STYLE 65536 FONT "Chicago", 12 PICTURE "8*R Run? Bail out" SIZE 4112 

* EOF* Subtraction . 2 • £mt 
READ • 

IF Bail«2 
CLEAR 

CLOSE DATABASES 

USE * Smart Guy 1 FaxBASE* /Mac 1 fox files; clones, dbf* 
.SET SAFETY ON 
S CREEN . 1 OFF 
RETURN 
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STCfcZ VAL(5YS(2) )-l!0 STARTIME 
STORE UPPER (Targetl), TO Target 1 

STORE. UPPER (Target 2} TO Targe t2 — 

STORE UPPER (Targ t3) TO Targets 

STORE UPPER (Ob j act 1) Tp Qbjectl ' 

STORE UPPER (Object 2) TOObject2 

STORE UPPER(Ctoject3) TO Object3 

olear 

SET T^LK ON 

GAP s TERMTNATE-lNmATE+1 
GO DOT33OT! 

SToSl^ ™ S * m ' 1 ^^' D ' P ' 2 <*'^ TO TEMPNUM 

COUNT TO TOT 

S^JHp^EP 1 ^ 0 P0R ^'E'.OR^^O'.OR.D^'H'iOR.tb'N'.OR.Dn'I' 

IP EaatchiO .AND, ttnatch«0 .AND. QnatchoO .AND. MflObO 
COPY • TO TEMPUfcSIG 
ELSE 

COPY STRUCTURE TO TEMPDESIG 
.USE TEMPDESIG 
ITSnatchol 

APPQOD FROM ISffiNOM FOR Ds'B* 
ENDIF 

I F'Hm atch=l 

APPEND FROM TEMENUM FOR D='H' 

max? 

IF tmatcbsl 

APPEND FR&Jl' TEHENUM FOR D»'0' 
ENDIF 

IF Ircatchsl 

APPEND FROM TEMPNUM FOR D= 'I 1 .OR.Do'X 1 
*.OR.Do»N« ^ 

..am iF 

ENDI F 

COCOT TO STARTOT 

COPY STRUCTURE TO TEMPLIB 
•USE TEM PT .TB ... 

APPEND FRQK TEMPDESIQ FOR library*UFPER (targetl ) 

APPEND^ FROM TEMPDESIG FOR library=UP?ER (target 2 ) 
ENDIF ; 

IF target3<>' » . 

AK^jD FROM. TEMPDESIG FOR library .UPPER (target3 ) 
EN DIF 
COONT TO ANAI/TOT* 

USE T EMTO ESIG 

COPT STRUCTURE TO 1EMPSUB 

USE TEMP5UB 

APPEND FROM TEMPDESIG FOR Iitauy*UPP£R(cfojectl) 
IP target 2o' • ; . 4 ' 

APPEND FROM TEMPDESIG FOR. library»UFFER (Ob j ect2 ) 
IF terget3o' 

•APra© FROM TEMPDESIG FOR library=UPPER(Cbject3 ) 



COUNT TO 6UBTRACTOT 
SBT TALK OFF 



* COMPRESSION SUBROUTINE A * 
? •COMPRESSING' QUERY LIBRARY' 
USE TEMPLIB 
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SORT CN' ENTRY, NUMBER 00 LIBSORT 
USE LZSSGRT 
COUNT TO IDGENE 
REPLACE AUi RFEND WITH 1 
HARX1 o 1 

. flW2i=0 

DO WKIL8 SW2-0 ROLL 
IP MAHK1 >= IDGENE 
PACK 

count TO AONIQUE 

'S«2=X 

LOOP 

QJDIF 
GO M&RX1 
DUP* 1 

STORE ENUtt? TO TESTA 
STORE D TO DESIGA . 
SW - 0 

DO KHIL5 SW«0 TOST 
SKIP 

STORE WTSCC TO TESTS 
STORE D TO DESIGB 

IP TESTA e TESTB.AND.DSSIGAbDESIGB 

DELETE 

DUP s EOT+1 

LOOP 

BKDIF 
GOMARKl 

REPLACE RPEND WTIH CUP 
HAKKl - HARHUDOP 
SW=I 

LOOP 

ENDDO.TEST 
LOOP 

ENDDO ROLL 

SORT ON RFQ^D/D , NUMBER TO TSIPtf ARSORT . 
USE TEMPIARSORT 

^REPLACE ALL START WITH RT&1D/XD3ENE*10000 
COUNT TO • TEMPTARCO 

* CCMPRfiSSICN SUBROtTTINB B 
? 'COMPRESSING TARGET LIBRARY' 
USE .TEKPSUB 

SORT ON EOTHT, NUMBER TO'SUBSORT 
USE SUBSORT 
COUNT TO SUBGENE 
REPLACE ALL RFIMD KITH 1 
MARKL c 1 
SW2.0 . 

DO WHILE SW2=0 ROLL 

IF KARKL >= SUEGENE 

PACK . * 

COUNT TO BUNIQUE 

SW2sl 

LOOP 

ENDIF 
GOHARKX • 
DUP - 1 

STORE, ENTRV TO TESTA 
STORE D TO DSSIOA 
SW » 0 

DO WHILE SW=0 TEST 
SKIP 

STORE ENTR? TO. TESTS 
STORE D TO DESIGB 

IF TESTA » TESTB.AND,DBSIGA?DESIGB 
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DUP a DCJP+1 

" "LOO?* - • • " • 

ZNDI7 . 

00 Mam 

REPLACE RFEND WITH CUP 
MARX! = MARKltDUP 
SHbJ 

LOOP 

ENDDO TEST 
LOOP ; 
EKDDO KOLL 

SORT CN RFEND/D, NUMBER TO TQ-tPSUBSOffT 
.'TJSB TEMP5UB9QRT 

* REPLACE ALL START WISH RFEKD/IDGQEr+lOGOO 
COUNT TD TEMP5UEC0 

*PCSIGN ROUTINE 

? 'SUBTRACTER} LIBRARIES 1 

tJSE SUBTRACTION 

COTY STRUCTURE TO CRUNCHER 

SELECT 2 

USB TfeMPSUBSORT 

SELECT 1* 

USB CRUNCHER 

APPEND FROM TEKPTARSORT 

COUNT 10 BAILOUT 

MARK * 0 

DO WHItS .T.. 

MARK b MARK+1 

IP MRRJOBAILOUT 
fiXIT 
ENDIP 
<30 MARK 

STORE ENTRY TO SCANNER 
S3LDCT 2 

LOCATE. FOR EWXTOfeSCANNER 
IF POUND () 

STORE RFEND TO BIT1 
STORE RFQJD TO BIT2 
ELSE ' 

STORE 1/2 TO BITl 
STORE 0 TO BJT2 
ENDIP 
SELECT I 

REPLACE BGFRBO WITH BIT2 
REPLACE ACTUAL WITH BIT! 
LOOP 
ENDDO 

SELECT 1 

REPLACE ALL RATIO WITH RFEND/ACTOAL 
? 1 DOING PINAL SORT BY RATIO 1 

'^J^^^^sm^teSCBXSWR, 00 PINAL 
USE PINAL 



CO CASE. 

CASS PTFsO* ' 

SET DEVICE TO PRINT 

S ET PR BffP ON 

EJECT 

CASE PTFol 

SBT ALTERNATE TO 'Adenoid .Patent Figures : Subtraction . txt - 
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SET ALTERNATE CN 
_3NDCA§E_ _. - 

6TCRE VAL(6YS(2) )' TO FINTJME 

j^FpjTlK&cSTARTIMS 

CTOR B PH5TIKE+SS400 TO .FJNrBffi 

jemdi f 

.ffJJQRE FJNTIME - BTARTXHE.TO CCMPSEC 
STORE CGMPSEC/60 TO OCKSWIN 

BET MARGIN TO 10 

81,1 SAY "Library Subtraction AnalyaiB - SlYLB 65536 PONT B Geneva\274 COLOR 0,0,0,-1,-1, 

7 
7 
7 

? dateO 
77 ■ • • 
77 TXMSO 

7 (Clone nunbefs ' 

77;CTR(INITIATE,5,0) 

.7? through ' ■ ■ 

?? SXH(TERN3NATO, 6,0} 

7 'Libraries t * 

7 ttargetl 

IP Targe t^o 1 

??. 1 * ' 

77 Targets 

ENDI? 

IP Target3<> 1 
7? ', < ■ 
77 Targets 

7 'Subtracting; 
7 Object 1 
XF-Objectao 1 
77- -V 1 
77 0bjeot2 
EUDIF 

IP Gbject3<>' 
7? \ 1 
77 Cbject3 
EKDI7 . 

•7 'Designationsf 

IP Ematch=0 .AND. Hmatch=0 .AND, Cnatch=0* .AND. IKATCH=0 

?? 'All 1 

EKDXF .. 

IF Bnatchal 

?7 'ataet,' 

ENDIF 

IF Hroatchsl 
77 'Human, • 

•IP Qnatchsl 
7? 'Other ep. • 
ENDIF 

IF Dnatch»l 
7.7 'mOTE' 
ENDIF 
/IF ANALsl 

7 'Sorted iy ABUNDANCE'- 

HOT?. 

IF ANAL-3 

7 'Arranged Toy FUNCTION' 
BOX? 
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? 'Total clones represented* • 

. ... ??.«MWtf,3.0L-.,- - * . 

? 'Total -clones analyzed: 1 

?? 6TO(STAR2OT,5,0) — 
? 'Total, cartputaticn. tiro: 
•?? STR{COMFMtN,5,3) '* 

1 minutes' 1 
? 1 

. detonation f - distribution location r * function s « epeiies i = inte 

jm 1 T*PB 0 HEADIN3 'Screen 1- AT 40,2 SIZH 386,4*2 PIXELS FCNT 'Geneva-,9 COLOR 0,0,0, 

CASE ANALal 

?? STR{AUNIQUE f 4,0) 

?? ' genes, for a total of 1 • 

' clones' 

? . , 

SCSEQ? 1 1YPS 0 HEAEEN3 •Screen 1" AT 40.2 SIZE 2fifi 403 BTypre m^n • - ~ ■ 

— ■SWS* ^^'•^TO'.fiiSSw^iS.^ ^ °'°' 0 ' 

CLOSE DATABASES , 

*treB/arartGuyiF0XBASE+/Mafi:fax files i clones, dbf 

CA8E.aNAL*2 

arrange/function 
SEP jpmot- CN 

SEP KEADEQS OT 

SCREEN 1 mo HEAD** Screen I'.AT 40,2 SIZE 286,492 PIXELS FOOT 'Helvetica-, 268 COU* 0 
* ' BINDING PROTEINS' 

.f^.f^ltS'Sd'Sisi- 7 40 ' 2 S1Z * 286 ' 4M ^ -K^ica-.^S COLOR 0 

'f^^B&gSS^Z^ l ' ' AT <°' 2 «» «■» 'Helvetica.,265 COLOR 0 

SCREEN1 KPE 0 HEADING 'Screen 1- AT 40.2 SIZE 286,492 PIXELS FOOT 'Geneva" 7 eci/tt n n n 

list OFF fields ««*er.D.r,a,H.WW.S.iro»X^ 0 ' ? '°' 

f^i^S^Sj^ 11 M 4 °' 2 ^. 286,492 PIXELS W 'Helvetica' ,265 COLOR 0 
BCHBEN1 KPB 0 KElOTJG 'Screen 1' AT 40.2 SIZE 286,492 PIXELS FONT 'Geneva' 7 PQT«» n n n 
list OFF fields «ato,D,P,z,R,EinOT.s,is^iTOR,BGFaEQ,RiHro f RAKO.I jtorr= : S' 

rS5 Sg^^ef" 66 " r W "? a SI2E 286 ' 4?2 "» 'Helvetica', 265 color 0 

SCREEN1 TYPE -0 HERDING 'Screen 1' AT 40,'2 SIZE 286,492 PIXELS FONT 'Geneva' 7 «vn» n n a 
list OFF fields number, D, F , 2, R, ENTRY, B , DESCRIPTOR, ESFRSQ, RFEND, RATIO, I FOfTlL ' I ' °'. 0 ' 0 ' 

i ""J ° HE " iKG ' toeen 1 «g l ffi .™ 286 ' 492 ™« «■» -Helvetica',268 COLOR 0 

f^aTSeie^ 8 '^"^ 4 °' 2 " ZE 2 . 86 ' 492 "»» ^-Helvetica', 2 65 COLOR 0 

SCRBajl WPE 0.HEADISG -Screen 1' AT .40, 2 SIZE 286,492 PIXELS •FOOT 'Geneva' 7 eorno n * n 
list OFF fields X^r,D,F,Z.R.ENr!iy.9,lKSCRIPTOR,BGFREQ,RF^ f RkTM tl TORR^O' ' ' 

F^taiX^SSS, : GCreCn V » 40 ' 2 SIZE 286 '" 2 ™ "» 'Helvetica* ,265 COLOR 0 
SCREEN! TOPE 0 HEADING 'Screen 1- AT 40,2 SIZE 286,492 PIXELS FONT 'Geneva' 7 COLOR o n O 
list OFF fields number, D, F, Z , R. ENTRY, 3 . DESCRIPTOR . BSFRZQ, RFKND, RATIO , I FORR= ' O ' 
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SCRE3T 1 TYPE 0 H8ADIHG 'Seraen 1* JOT AO 2 er?r jo-» 

? 'Viral elenenteT^ 1 . 0,2 SIZE 286 '" 2 KCxas *■» •Helvetica'.ass COLOR o 

SCREW 1-TVPS 0 HEADXN5 'Screen 1* AT 40,3 SIZE 28fi 4« error a •» •» ' , • 

? 'Tumor-relate* antigens. • ' 386,492 PIXELS PCNr 'Helvetica'^ COLOR 0 

pro i™ o mro* 2^\^-"gjf« ^ «■ •»*—■ . 

fSi^^'^^^^^gJ';:™ •« •«.iv. tl „.. Jts „ 

fSS,!."™ • to " n 1- « «' 3 4128 ™> -HUvKica-.Ms oaoi.0 

SCREEN 1 TYPE 0 HEADING "Screen 1' AT AO 2 stzp 3*6 aq^ btwt« — 

.11* OFF 'fields *^.M.%.*^$jiS^ ^0.0, 

fSic^proS? l * Ar * 40 -'' 2 61ZE 28S '« 2 «»■ 'Helvetica' ,265 COLOR 0 

SCREEN 1 TSf?E 0 HEADING "Screen 1* AT 40 2 SIZE 3Hfi Trrvwe cmm — 

iiat off fields «^.D;r.iS^.S4^;^™ ^ 0.0.0/ 
• FEASZlilg! l ' ^ 4 °' 2 SIZB ' 286 '" 2 W" «** MfcXvetica-,265 COLOR 0 

.SCREEN 1 TYPE 0 HEAD** -Screen i- AT 40,2 SIZE 286.492 P&ELS . pqW -Helvetica- , 2 6 8 C0tQR 0 



' ' ENZYMES' 
? 



Ffer^T^ " SCreen 11 " 4 °' 2 8132 a ? 6 '«2 PIXELS FOOT .He lV etice.. 26S COLOR 0 

aw &u !&:??ri^ o.o.o, 

f^tLaTJd^toreT 63 - 1 "" 4 °' 2 SKE 286 ' 4 " «™ -Helvetica-, 265 Ct»L0R 0 

ECREE4 1 TYPE 0 HEADHvG "Screen 1 B AT 40,3 stzk 9ftrt aoi Mvrm . 

list OF? fields ^.UI.l.lUW«?8ii»^ °'°' 0 ' 

F5g^&5EgZ32S?. l ' 40 ' 2 S1ZS 28S ' 492 ™* «» "Helvetica-^* COLOR 0 

BToA S&! BJ^^^ m.«. 

SlJT 5 ' ,£dreen 11 * 4 °' 2 S1ZS 26S '« 2 «»* FONT -Helvetica-,265 COLOR 0 

?5SSi S JSSSJ?"-- 11 4 °' 2 S ^ 2 "< 492 PKE * ^ •Helvetic... 2S 5 COLOR 0 
SCREEN 1 TWE 0 HEADING "Screen !• AT 40.2 S12S 286.492 PIXELS FONT "Geneva-.? COLOR o'.O.O; 
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list COT fields nw^,D,p,z,R,B*m,s,iraaara«,80^ for Ra'M- 

" ' f^S^^lSssJis^ l * * 40,3 sn * " M92 iBM^'^fi a&» o 

SCREEN! .TYPE 0 'HEABDJ3 •Screen !• AT 40,2 SIZE 285,492 PIXELS FCNT 'Geneva',7 ™ TrV» 0.0.0* 
list. OCT fields nonber,D,F,Z,R,anw,S,rffiSCias^ ns Ro'H 1 ^^ ' ' ' 

f^Sl SilSS^ ,SSCreeB *' 4 ?' 2 38S ' 4S2 ?BtS;t,S ' ,H «lvetiea»,aS5 COLOR 0 
SCRBWl TYPE 0 HEADING 'Screen X' AT 40,3 SIZE 285,492 PIXELS FCNT "Geneva- 7 COLOR 0.0.0. 
list OCT fields »nnber,D,F,Z,R,BNTRY,S,KSCRIFTOR,8^^ for r^w- ' ' 

VwhZ ^jgff" 2 " 3 ,Berem *' «« 286,492 PIXELS FOOT 'Helvetica" ,285 COLOR 0 

fp^J J"? 0 HEA0IN3 'Screen 1- AT 40,2 SIZE 286,492 PIXELS FOOT 'Geneva* ,7 COLOR 0.0,0 
list OCT fields au^r,D,F,Z,R,B^,a,0ESCRI^ TORrV'B' 

SCREEN 1 WPS 0 HEAOB& "Screen 1' AT 40,2 SIZE 286,492 PIXELS FCNT ■Helvetica', 268 COLOR 0 

I ' MISCELLANEOUS CATEGORIES' 

? 

? C S a ™ Sna^^ ' SCre6n *' AT 40 ' 3 ^ 28M92 PIXELS ,Hel ^tica-,265 COLOR 0 

"pO.'toMn 1" AT 40,2 SI22 286,492 PIXELS FONT 'Geneva- ,7 COLOR 0 0 0 

ix9t off fields nwte.D # r i z,a,n^ °' 0 ' 0 ' 

fSiJS.? . ,SCreea *' ** 40 ' 2 aZS 286 ' 492 PIXEM 'Helvetica',265 CGLOR'O 

SCREEN JL WPE 0 HEADING -Screen 1- AT 40,2 SIZE 296,492 PIXELS TOT "Genova- 7 COLOR 0 0 0 
list OFF fields nurnber,D,F,Z,R,B^ °'°' 0 ' 

SLeUs^ 1 ^ 3 " SCr€fn ** AT 40:2 5122 386,492 POG ¥ ™ '^lvetica-^SS COLOR 0 

fF^FSei ° ^E^ 01 * 0 "Screen 1- AT 40.2 SIZE 286.492 PIXELS ' 'Geneva • 7 COLOR boo 

list OFF fields mate.D,F,Z.lt,n^ °' 0 ' 0 

' f^L^un^^f^ 11 KT 40 ' 2 5ISB 286 ' 492 PCCEM ™ ■Helvetica-,265 COLOR 0 
SO^l TYPE 0 HERDING -Screen 1- AT 40,2 SIZE 286,492 PIXELS SWT -Geneva- 7 COLOR 0 0 0 

listOFP fields nuaber,D,?,z,R, °' 0 ' 0 ' 

DO "Teat print .prg" 

SET PRINT OFF 

SET DEVICE TO SCREEN 

CLOSE DATABASES 

ERASE TEMPU3.DBF 

ERASE TEWPNUMiDBF 

ERASE TEMPDESIG.DDF 

SET HAHGIN TO 0 

CLEAR 

LOOP 
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•Northern Jflirtgla) , version 11-25-94 
databases 

- - -SBT TOLK OFF - ' " ** 

SET FHXH2* OFP' 

SET EXACT OFF 
CLSftR ' 

STORE .• • TO Ecbject 

STORE • , „ _ 

STORE 0 TO NurnS. TO 

STORE 0 'TO Zog 

STORE 1 TO Ball 

DO WHILE .T. 

* Program, i Northern (single) , 6nt 

* Data....; 8/ am . ' 

• Version,! .FbXBASE+/Kae,' r*vieion i.10 

t Notes. «.v.f .Format file Northern (single) 

• PIXELS 115 173 GET meet fiTOLB^ S?™ ^''U COLOR 0,0,0,-1 -1 -1 

| PIXELS 145 89 »Y •^£fci^ S £^«??^'" 3123 COLOR 0,6,0.' -1,-1 -1 

« PIXELS 145,173 GET Dobleet BTOTfi ObSw? fn .S*"^'." COLOR 0.0.6;-l -l -I 

f PIXELS 35,89 SAY 1^ «S « ^ 'Hi 08 15 ' 24 * COLOR 6,0 , 0-1-1 '-1 

•™ 80 ' 1S2 - ci7A^» 

MWPi Northern (single*), fmt " lf 

IP Bailo2 
CLEAR . 
screen 1 off 
RETURN 



S ^^'^^i^/Mac.Pox filea.Lootop.abt . 
IP Bob jeoto' 

TO " L00kl ? • 

USB "Lookup entry. dbf f 

JOCATE FOR.iiooJccEobject 
to ..WP.W0HDO ' 
CLEAR 

LOOP 



STORE Entry TO Searchval. 

CLOSE DATABASES 

SjASB .»I*o)cup entry, dbf 

H3DIF 

•IP-Dobjacto' . 
SET EXACT OFF 
SOT ffitfBIY OFF 



^J^° 0,cup **ariptor.fflbff« 



CLEAR 



ft 
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LOOP 
BNEXF 

~ BROW SE- - - • - -* 

STORE Etatry TO Searchval 

CLOSE DATABASES ■ 

BRASS "Lookup descriptor. dbf 

SET EX ACT ON 

ENUXF • 

IP K uaffboO 

OSS ■SmartOuyiFoxBASS+/Wac;Fox files ; clones. dbf 
00 Ntsrib i 
BROW SE ' 

•S TORE Entry TO Searchval 
CI£AH 

? 'Northern analysis for entry * 
?? Searchval 

* . • 

? 'Eater V to proceed 1 
WATT TO OK • 

IF UPPER (OK) o f Y' 
screen 1 off 
RETURN 
ENDIF 

* COMPRESSION ' SUBROUTINE FOR Llfcrary,dbf 
? 'Ccupreasing the Libraries file now;..' 

USE "SmartGuy:FoxBASS+/Mac:Fox files : libraries, dbf' 

SET SAFETY CSV , 

SORT CN library TO •Ccnpreeaed libraries. dbf" 

* FOR eatered>0 ' 
SET SAFETY ON 

USE "Compressed libraries, dbf 

DELETE FOR enteredrO 

PACK 

COUNT TO TOT 
MARK1 b 1 
SW3«0 . 

DO WHILE SW2=0 ROLL 

* IF MARJp, >o TOT 

• PACK 
SW2=1' 
LOOP 

GO MARX1. 

' STORE library TO TESTA 
'SKIP 

Store Libr ary to testb 
IF TESTA = TESTS 

SNDTF 

MARK1 5 WARK1+1 
LOOP ' 
ENDDO ROLL 

* Northern analysis 
CLEAR 

? 'Doing the northern new. . . 
SET TALK W 

USE ' Smart Gvy i FoxHASE* /Mac i Fox f ilea i clones. dbf *' 
SET SAFETY OFF 

COPY TO "Hit B. dbf * FOR entry»aearehval 
SET SAFETY OX 
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• MASTER ANALYSIS 3; VERSION 12-3-94 

«r~ ter renu for aattlysio output 

CLOSE DATABASES 

SET TALK OFF 

SET SAFETY OFF 

CLEAR 

SET DEVICE TO SCREEN 

STORE NUMB^ TO INITIATE! 
GO B OTTOM 

STOR E NUMBER TO TERMINATE 
STOR E 0 TO ENTIRE 
STORE 0 TO CONDEN 
STORE 0 TO ANAL 
STORE 0 TO EMATCH 
STORE 0 TO HMXTCH 
STOR E -0 TO GMfcTCH 
STORE 0 TO IM&TCH 
STORE 0 TO XMATCK 
STOR E 0 TO PRINTCN 
STORE 0 TO PTP 
DO WHILE .T. 

* Program.: tester analysis, fmt 

* Date....: 12/ 9/ST4 

* Vision.: FoxBASEWMac, revision 1.10 
^ Notes....? Format file Master analysis 



t as ii'H ^ -ss-sSi 0 ^ ss"-s2y?y ssa-spsv*! °.°-^-i 

fl PIXELS 54,261 GET anal STYLE 65536 FONT '(Vram" I! C Conde «3ed format' SIZE 

• "5,126 GET HMATCH STVLE 6S536 £££££ If 2 atact " SIZ E 15,62 CO 

Hi IF- y*E : ri t "«i 



MSOF: Master analysis. fmt 

IF ANAL»9 
CLEAR 

CLOSE DATABASES 
ERASE TEMFMASTER.DBF 

SOT 3S^ ,Pt,xMf/Mte:fc)K fi^s: clones. dbf 

SCREEN 1 OFF 

RETURN 

END IF 
clear 
? INITIATE 
? TERMINATE 
? -CONEEN 
? ANAL 
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7 ematch 
? Hmatch 
? Csnatch 
? tMATCS 
SET TALK ON 

I? ENTIHEs 2 
USE ■ Unique libraries .'dbf • 

REPLACE ALL i WITH • • * 

gg£ S ?IELDS i ' lil ^' litea ^'total, e nt ere d AT 0,0 

COFV STRUCTURE TO TEMPLIB 
USE TEMPLIB 
IP ENTORE-1 

UPP33D FROM •SmartGuy:Po^BASB4/Maci£ox files: Clones. dbf- 
IP EOTIRE&2 
USE "tfaigue libraries, dbf • 

COFV TO SELECTED FOR UPPSR(iU»Y» 
USE SELECTED 

STORE R3CC0TJNT0 TO STOPIT 
MARKal 

DO WHILE ,T. 

TP MARIOSTOPIT 

CLEAR 

EXIT 

ENDI? 

USE SELECTED 
GO MARK 

STORE library TO THISCNE 
? 1 COPYING 1 
?? THISOME 
USE TEMPLIB 

S'Sl'W™^ 5 '^ «i e s:Clone S .dbf . FOR llbraxy-raiSONE 
LOO? 
SBJDDO 
ENDIF 

COPV STRUCTURE TO TEMPDESIG 
USE TEMPDESIG 

ENDIF 

IF ESMLCChsl 

APPEND PROM TEMPLIB FOR D='E' 
ENDIF 

IF Hmatchal 

APPEND PROM TEMPLI3 FOR D='H' 
ENDIF 

IP Qmacchsl 

APPOSD PROM TEMPLIB FOR Da'O' 
ENDIF 

IF Imatchal 

JggP FROM TEMPLIB FOR D=* I ' .OR. Do 'X 1 .OR.D»'N* 
IF Xmatehoi 

APPEND PROM TEMPLIB FOR D=*X* 

ENDIF 
COUNT TO ANALTOT 
set talk off 



DO CASE 
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CASE PTP=0 

SET DEVICE TO PRIOT 

SET PRINT ON 

EJECT 

CASE PTF=1 

SET ALTERNATE TO "Total function sort, tact" 

•SET ALTERNATE TO °H and 0 function sort. txf 

~ ^S^™ TO " Shear streas HUVEC 2: Abundance sort.txf 

S JJSS? 58 TO 0Shear Stre9S HSVEC 2 i Abundance con.t*t« 

*SE ^5**®^ TO 'Shear Stress HUVEC 2: Function sort.txf 

T0 -shear stress HUVEC 2:Distributien sort.txt" 
!££ «I55£E5 tfS ^ ear stress HUVEC l;Clone list.txf 
^*%^*™J° " shear Stress HUVEC 2:Location eort.txt" 
SOT ALTERNATE ON 
ENDCA3E 

IP PRINTDN=1 

|jU30 SAY -Database Subset Analysis' STYLE 65536 PONT "Ganeva-,274 COLOR 0,0,0,-1,-1,-1 

? 
7 
? 

7 

? date<) 
?? 1 

?? TZMBO 

7 1 Clone- numbers 1 

?? 9TR( INITIATE', 6,0) 

?7 1 through 1 

?? STR (TERMINATE, 6,0) 

? 'Libraries: 1 

IP ENTIRES 

7 'All libraries* 

ENDIF 

IF EOTIRE=2 
MARK»1 
DO WHILE .T. 
IF MWUOSTOPIT 
EXIT 
ENDIF 

USE SELECTED 
GO MARK 
? • i 

77 TRIMUibname) 
STORE MARK+1 TO MARK 
LOOP 
ENDDO 
2NDIF 

? 'Desiemaciona! ' 

IP Ematch=0 .AND. Hmatch=0 .AND. Qnatch=0 .AND. IMATCH=0 

77 All 1 
ENDIF 

IF Etaatch=l 
?? 'Bcact, » 

ENDIF 

IF Hmatch=l 

?? 'Human, * 

ENDIF * 

IF Ctaatchsrl 

77 'Other sp. < 

ENDIF 

IF Imotch=l 
7? 'IKCVTE' 
ENDIF 

IF Xiratch=l 
7? "EST* 

60 



WO 95/20681 



PCT/US95/01160 



ENDXF 

IF CONDEN^l 

? •Condensed format analyeia 1 

ENDIF 

IP ANALel 

? 'Sorted by NUMBER* 
ENDIF 

If anal=2 

? 'So rted by ENTRY 1 

ENDIF 

IF ANAL=3 

? 'Arranged by ABUNDANCE' 

ENDIF 

IF ANAL*4 

? 'Sorted fcy INTEREST 1 

H3DIF 

IF ANAL=5 

? 'Arranged by LOCATION*' 
EHDIF ' 
IF ANAL=6 

? 'Arranged by DISTRIBUTION' 
IF ANAL=7 

? 'Arranged by FUNCTION' 
ENDIF 

? 'Total clones represented! 1 

?? STRCSTARTOT, 6, 0) 

? 'Total clones analyzed.* ■ 

?? STR(ANALTOT,6,0)' 

? 

V '1 = library d = designation f = distribution z = location r = function c * cer 

* — ***** 

SCra 1 TYPE 0 HEADI.NO "Screen 1- AT 40,2 SIZE 286,492 PIXELS FONT -Geneva-' 7 COLOR 0,0,0, 

CASE ANALrl 

* sort/number 
SET HEADING Qt> 
IF CQNDEtfal 

SORT TO TEMPI ON ENTRY, NUMBER 
DO "COMPRESSION number, PRG' 
ELSE 

SORT TO TEMPI CN NUMHSR 
USE TEMPI 

iifL^L^? 1 ?! nun ^' L ' D ^'Z*R'C,arrRy,s,DSSCRiPTOR 

ERASE TEMPI. DBF 
ENDIF 

CASE ANALs2 

* Born/ DESCRIPTOR 
SET HEADING ON 

*™ 25 S 1 W DESCRIPTOR , ENTRY , NUMBER/ S for D= 'S\OR.D='K\OR.D= '0' OR D-'X' OR n a .T. 
J^JPJ?® 1 ON ENTRY, DESCRIPTOR, NUMSSR/S for D='E' OR S'H' OR £'0» 3 fc'X- '22 nl'T. 

DO "COMPRESSION entry. PRO* 



USE TEMPI 

E RASE TEMPI, DBF 
ENDIF 
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CASE ANAL=3 
* sort by abundance 
SET H2ADIN0 Of 



^^w™? 1 ON ENTRY, NUMBER for D='E' .OR.D='H« .GR.Ds'O' .OR.Dx'X- OR Do'T' 

DO ■compression abundance. erg* * .or.d= 1 



CASE ANAL*4 
* sort/interest 
SET HEADING W 
IF C0NDEN=1 

SORT TO TEMPI ON ENTRY, NUMBER FOR I>0 
DO 'COMPRESSION interest . PRO - 



SORT ON I/D, ENTRY TO TEMPI FOR I>1 
USB TEMPI 



ERASE TEMPI, DBF 
ENDIF 

CASE ANAL=5 
* arrange/location 
SET HEADING ON 
STORE 4 TO AMPLIFIER 
? •Nuclear: 1 

DO "Compression location. prg* 
ELSE 

DO 'Normal subroutine 1* 
ENDE? 

? 'Cytoplasmic: 1 

i? I ?oS^ Y ' NUM3ER PIELDS ^'^^' L ' D ' F 'Z'^C. E ^V. S , D ESC^ ProR ,c^TH.I N IT < I.CO^ 

DO "Compression location. prg" 
ELSE 

DO "Normal subroutine l m 

ENDIF 

? •Cytbsfceleton: ' 

iT^S^^^ ?IELDS ^^^'^'^'^RY^^ 

DO 'Compression location.org" 
ELSE 

TO •Normal subroutine 1" 
ENDIF 

? 'Cell surface: 1 

?? R 3cSS! RV ' NUMBSR ^' l ^' L ' D ^'2^^^V, S .D ES CRl PTOR ,t^ >njIT , t( CC^ 
DO "Compression location. prg" 

DO •Normal subroutine 1* 
ENDIF 

? 'Intracellular membrane: 1 

DO "Compression location. prg" 

DO "Normal subroutine l n 
ENDIF 

? 'Mitochondrial: 1 

DO "Compression location. prg- 
ELSE. 

DO ■Normal subroutine 1" 
ENDIF 
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7 1 Secreted] ' 

g^csowreflflioa locaticn.pr*' 

DO -Normal subroutine 1" 
ENDIF 

? 'Otherr 

Dp -Conprasflion locatioa.pro" 

DO "Normal subroutine 1" 
ENDIF 

? 'WnJcnownj 1 

^Compression location.prg" 

DO 'Normal subroutine 1- 
ENDIF 

IF CONBSNsl 

HE PRINTER 
SET PRINTER ON 
EJECT 

f5L'° Btp,lt headinff.prg* 

"Analysis locatioi.dbf • 
"""Create bargraph.prg' 
SET HHADIKO OFF 
I ' FUNCTIONAL CLASS 

? TOTAL UNIQUE NEW % TOTAL' 

ERASE TEK?2 . DBF 
SET HEADING ON 

JJSE^ ■ SmartGuy ; FoxBASS* /Mac : fox files iraMBASIER.dbf • 
CASE ANAL=S 

*arrange/distribution 

SET HEADING ON 

STORE 3 TO AMPLIFIER 

i^^/ciasue specific distribution: 1 

00 " Compression distrib.pry- 

CO "Normal subroutine 1" 
ENDIF 

L^-sP^fic diBtribuctoni ' 

PO ■Compression distrii.prg" 

DO "Normal subroutine 1" 
ENDIF 

LJ?** 1 * 1 distribution: ■ 

^^Compression distrib.prg- 

DO •Noxxral subroutine 1" 
EttDIF 

IF CQNDENel 

SET DEVICE TO PRINTER 

SET PRINTER ON 
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EJECT 

00 "Output heading. prg 1 

USE "Analysis distribution.dbf ■ 
DO •Creates bargraph.prg" 
SST HEADING OFF 

1 \ FUNCTIONAL CLASS TOTAL UNIQUE % TOTAL' 

IS^^^SE! ^ t ^^'^ JQ^ ^/Q2^^ f PERCENT, GRAPH 
Cm75£ DATABASES 

ERASE TEMP2.DBF 

SET HEADING ON 

^E^°SmartGuy:PoxHASE+/Mac:rox files : TEMPMASTER. dbf u 

CASE ANAL=? 

* arrange/function 
SET HEADING ON 
STORE 10 TO AMPLIFIER 

* ' BINDING PROTEINS 1 
? 1 Surface molecules and receptors ; ' 

DO *Campression function. pro" 
ELSE 

DO "Normal subroutine 1" 



? 'Calcium-binding proteins; ' 

S R SqSS RY ' NUMBSR FIEtDS ^• i ^' L ' D ' F 'Z'R'C'^ S/ D ESCM P TOR> i^,i taT<I , C0} ^ 

DO -Compression function .pro- 
ELSE 

DO •Normal subroutine 1° 
EKDIF 

? 'Ligands and effectors i ' 

^lcS^r ,mR FIELDS ^N^^.D.^Z.R-C.^RY.S^SCRIFTOR,!^,!^,!^^ 

DO 'Compression function. pro" 
ELSE 

DO 'Normal subroutine l* 
ENDIF 

? 'Other binding proteins j ' 
DO •Compression function •dm" 



DO •Normal subroutine l b 

ENDXF 

•EJECT 

I ' ONCOGENES' 



? 'General oncogenes! 1 

S R SSS RY ' N0MB2R P1ELDS ~'*»W.«.*.C < ain D r. 8 ,SBSra^ 

DO "Compression function .pro" 
ELSE 



DO •Normal subroutine 1" 
ENDIF 



? 'GTP-binding proteins i' 

^ '•Compression function.prg" 
ELSE 



DO "Normal subroutine 1 
ENDIF 

? "Viral elements i » 
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fiOOTOT^y^a FIELDS IWiraKH^ 

DO "Compression function. prg" 

DO "Normal subroutine 1" 
ENDIF 

? 'Kinases and Phosphatases:' 

DO "expression function. prg* 

DO "Normal subroutine 1' 
ENDIF 

? 'Tumor-related antigens r 

S H ?qSeS RY ' NUMBER ?IELD3 ^•^^' L ' D ' F ' Z ^»C-HOTRy.S,D E SCRI PTO R. I ^ IGTO(INIT/I<ca ^ 

DO "Compression function, prg' 
ELSE 

DO "Normal subroutine 1' 

SNDZF 

♦EJECT 

J ' PROTEIN SYNTHETIC MACHINERY PROTEINS' 

LZ r SS 8 l ripti0Zl and »wcleic Acid-banding proteins: 1 

DO "Compression function. pre* 
ELSE 

DO 'Normal subroutine l 1 

EWDIF 

? 'Translation: ' 

DO "Compression function. prg" 
ELSE 

DO •Normal subroutine 1* 
ENDIF 

? 'Ribo social proteins: * 

S R lcS^ Rr,mjMBSR P1ELDS ^'«^.^o^.*.*.e,«wsw ( 6,oraPTO.iaenB. znxt.x.comm 

DO "Compression function.prg" 



DO "Normal subroutine 1" 
ENDIF 

? 'Protein processing: 1 

DO 'Compression function. prg". 
ELSE 

DO "Normal subroutine 1 B 

ENDIF 

*BJBCT 

I 1 ENZYMES' 

? 'Ferroproteinsi ' 

IF R ?cSe^T RY ' OTMBER ™* l ™' W ™' L - D ' P ' a ' R '* 

DO "Conpression function .prg" 
PLSE 

DO "Normal subroutine 1" 
ENDIF 

? 'Proteases and inhibitors: 1 

DO • Compres si n function. prg" 
ELSE 
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DO 'Normal subroutine I" 
ENDXF 

7" Oxidative phosphorylation; • 

K^conpreaaioa function. prg" 

DO "Normal subroutine !■ 
E7DIF 

? 'Sugar -metabolism* ' 

!gU«.-l« FISLDS ^^.^F.Z..^,^,™,^,^,^ 
DO^ Conpression function. prg' 

M "Normal subroutine 1' 
EOIF 

V 'Amino acid metabolism: • 

t^Coinpression function. prg' 

DO "Normal subroutine 1' 
ENDXF 

? 'Nucleic acid metabolism: * 
^Compression function, prg- 

ELSE 

DO * Normal subroutine !• 
ENDIF 

? 'Lipid metabolism: ' 

SORT ON ENTRY, NUMBER FIELDS RF2ND number t n * * t> ^ ^ 

5? ^ Compression function. prg" 



DO -Normal subroutine 1- 
ENDIF 

? 1 Other en2ymes i 1 

^Compression function .prg- 

DO •Normal subroutine 1° 

ENDIF 

♦EJECT 

? » 

? MISCELLANEOUS CATEGORIES 1 

I' Stress' response: 1 

ffSSff'*'— ^ ^.m^,uo.r.z,n.c, Brm , s ,r^^,^^ lfaam 

DO •Compression functioh.prg- 



DO •Normal subroutine 1" 
ENDIF 

? 'Structural!' 

^^Compression f unction •prg- 

DO -Normal subroutine 1" 
ajDiF 

? 'Other clones! ' 
W^Conpresaion fun 0t i n.prg- 
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DO "Normal subroutine 1" 
ENDIF 

? 'Clones of unknown function: 1 
DO •CozipreBsion function •pra; - 



DO "Normal subroutine 1" 
END1F 

IF CONDENnl 
EJECT 

♦SET DEVICE TO PRINTER 
♦SET PRINT ON 
DO^'Output heading .prg w 

USE "Analysis function. dbf" 

DO "Create bargraph.prg" 

SET HEADING OFF 
*★* 

SCREW 1 TSfPS 0 HEADING "Screen 1" AT 40,2 SIZE 2*6,492 PIXELS "QenBva-,12 COLOR 0.0,0 

O I 

? . cUwcTiuNftL CLASS CLONES GENES GENES FUNCTIONAL CLASS' 

t T^ T n2?^^5 S ^ P : CLONES , GZNES , NEW, PERCENT, GRAPH ,CCMFANY 

%ZL°Z* FI2LDS P * NAME , CLONES , GENES , NEW , PERCENT , GRAPH 
CLOSE DATABASES 
ERASE TEMP2.DBF 
SET HEADING CN 

•USE *SrrartGuy:FoxBASE+/Macifox files : TEMPMASTER • dbf ■ 
ENDIF 

CASE ANAL=8 

DO "Subgroup sumtary 3.prg" 
OJDCASE 

DO "Test print. pro;* 

SET PRINT OFF 

SET DEVICE TO SCREEN 

CLOSE DATABASES 

•ERASE TEMPLIB.DBP 

•ERASE TEMPNUM.DBF 

* ERASE TEMPDESIG.DBF 

•ERASE SELECTED. DBF 

CLEAR 

LOOP 

ENDDO 
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* COMPRESSION SUBROUTINE FOR ANALYSIS PROGRAMS 
USB TEMPI 
COUNT TO TOT 

REPLACE ALL RFEND WITH 1 

MARK1 = 1 

SW2»0 

DO WHILE SW2=0 ROLL 
IF MARKl >o TOT 
PACK 

COUNT TO UNIQUE 

COUNT TO NEWOENES FOR D= 'H' ,OR.D= l O f 

SW2=1 

LOOP 

EMDIF 
GO MARKl 
DUP s 1 

STORE ENTRY TO TESTA 
SW b 0 

DO W HILE SW=0 TEST 
SXIP 

STORE ENTRY TO TESTB 
IP TESTA = TESTB 
DELETE 
DUP = DUPrl 
LOOP ■ 

a® if 

GO MARKl. 

REPLACE RFEND WITH DUP 
MARKl - MARX1+DUP 
SW=1 
LOOP 

ENDDO TEST 
LOOP 

ENDDO ROLL 
•GO TOP 

STORE Z TO LOG ' 

USE 'Analysis location. dbf" 

LOCATE FOR ZcLOC 

REPLACE CLONES WITH TOT 

REPLACE GENES WITH UNIQUE 

REPLACE NEW WITH NEWOENES 

USE TEMPI 

SORT ON RFEND/ D TO TEMP2 

USE TEMP2 

?? STR(UNIQUE,5,0J 

1 genes, for a total of • 
?? STR(TOT # 5,0) 
?? 1 .clones' 

* 1 r _ , v Coincidence 1 

list off fields mmiber ( Rron>,L,D,F,Z,R,C,E^^ 

*SET PRINT OFF 
CLOSE DATABASES 
ERASE T34P1 » DBF 
ERASE TE4P2.DBF 
USB TEMPDESIG 
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* COMPRESSION SUBROUTINE FOR ANALYSIS PROGRAMS 
USE TEMPI 
COUNT TO TOT 

REPLACE ALL RFEND WITH 1 

MARX1 e 1 

SW2«0 

DO WHILE SW2=0 ROLL 
IP MARKl >= TOT 
PACK 

COUOT TO UNIQUE 

SW2=1 

LOOP 

ENDtP 
GO MARKl 
CUP = 1 

STORE ENTRY TO TESTA 
SW • 0 

DO WHILE SW=0 TEST 
SKIP 

STORE ENTRY TO TESTB 

IF TESTA = TESTB 

DELETE 

DOT « DUP+1 

LOOP 
•ENDIP 
GO MARKl 

REPLACE RFEND WITH DUP 
MARKl « MARK1+DU? 
SW=1 
LOOP 

ENDDO TEST 
LOOP 

ENDDO ROLL 
•BROWSE 

*SET PRINTER ON 

SORT ON DATE TO TEMP2 

USE TEMP2 

?? STR{UNIQDE / 4 I 0) 

?? ' genes, for a total of 1 

77 SIR (TOT, 4,0} 

77 clonea 1 

7 

? ' V Coincidence' 

COUNT TO P4 FOR 1-4 

IF P4>0 

7 STR<P4,3,0) 

77 • genes with priority = 4 {Secondary analysis:) ' 

list off fields number / RFEND, L, D, F , 2 , R, C , S * DESCRIPTOR, LENGTH, INIT for 1*4 

ENDI? 

COUNT TO P3 FOR 1*3 

IF P3>0 

? STR(P3,3,0) 

?? ■ genes with priority « 3 (Full insert sequence;)' 

list off fields number , RFEND # L i D / F # Z , R, C , ENTRY, S , DESCRIPTOR/ L33GTH, INIT for 1*3 
ENDIF 

CODOT TO ?2 FOR 1=2. 

IF P2>0 

? STO<P2,3,0) 

liml £l? e ?4^ h V^^y - 2 Urinary analysis eortplete;) ■ 

list off fields number i RFEND, L, D, F, Z, R, C, ENTRY, G, DESCRIPTOR* LENGTH, INIT for 1=2 
ENDIP 

COUNT TO PI FOR 1=1 
IF P1>0 
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7 Sra(Pl,3,0) 

ILl Ef2 e f. W {^ P ri ^ rit y - 1 (Primary analyBis needed, }» 



*SET PRINT OFF 
CLOSE DATABASES 
ERASE TEMPI, DBF 
ERASE TEMP2.DBF 

U9E , SmarcGiytPoocBASE+/Maojfox files : clones. dbf 
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^L^SS? 8108 * SUBROUTINE FOR ANALYSES PROGRAMS 
USE T2WP1 

COUNT TO TOT 

REPLACE ALL RFEND WITO 1 

MARX1 = 1 

SW2«0 

DO WHILE SW2=0 ROLL 
IF MARK1 >s TOT 
PACK 

COUNT TO UNIQUE 

SW2=1 

LOO? 

ENDIF 
GO MARJQ 
DUP = 1 

STORE ENTRY TO TESTA 
SW a 0 

DO WHILE SW=0 TEST 
SKIP 

STOKE ENTRY TO TESTS 
tP TESTA = TESTS 
DELETE 
DDP a DUP41 

LOOP 
ENDIP 
GO MARK1 

REPLACE RFEND WITH DUP 
MARK! e MARX1+DUP 
6W=1 
LOOP 

ENDDO TEST 
LOOP 

EMDDO ROLL 
*BROWSE 

*SET PRINTER ON 

SORT ON NUMBER TO TEMP2 

USE TEMF2 

STR (UNIQUE, 4,0) 
?? ■ genes, for a total of • 
?? STR(TOT,5,0) 
?? 1 Clones 1 

i<«* ** *j V Coincidence 1 

list off fields nuniber , RE3ND, L, D, F, z, R, c, eotry, s , descriptor, length , intt , i 

*SET PRINT OFF 
CLOSE DATABASES 
ERASE TEMPI, DBF 
ERASE TEMP2.DBF 

USE i SmartGi4yjFoxBASE+/Mae:fax files: clones. dbf ■ 
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* COMPRESSION SUBROUTINE FOR ANALYSIS PROGRAMS 

USE TEMPI 

OOUW T TO TOT 

REPLACE ALL RFEND WITH 1 

MARK1 = 1 

SW2=0 

00 WHILE SW2=0 ROLL 
IF MARK1 >e TOT 
PACK 

COUNT TO UNIQUE 

COUNT TO NEW3ENES FOR D='H' .OR.Ds'O' 

SW2al 

LOOP 

ENDIF 
GO MARK1 
CUP - 1 

STORE ENTRY TO TESTA 

sw a b 

DO WHILE SW=0 TEST 
SKIP 

STORE ENTRY TO TESTS 

IF TESTA = TESTB 

DELETE 

DUP = DUP+1 

LOOP 

ENDIF 
GO MARK1* 

REPLACE RFEND WITH DUP 
MARK1 - KARK1+DUP 
SW=1 
LOOP 

ENDDO TEST 
LOOP 

ENDDO ROLL 
GO TOP 

STORE R TO FUNC 
USE "Analysis function. dbf 
LOCATE FOR P=FUNC 
•REPLACE CLONES WITH TOT 
REPLACE GENES WITH UNIQUE 
REPLACE NEW WITH NEWGENE3. 
USE TEMPI 

SORT CN RFEND/ D TO TEMP2 

USE TEMP2 

SET HEADING CN 

?? STR (UNIQUE/ 5/0) 

?? • genes, for a total of 1 

?? STR(TOT,5,0) 

?? 1 clones 1 
*** 

? ' . V Coincidence 1 

list Off fields number, RFEND, L,D,F, Z, REENTRY, S, DESCRIPTOR, LENGTH, INIT, I 

* SCREEN 1 TYPE 0 HEADING •Screen 1" AT 40,2 SIZE 286,492 PIXELS FONT "Geneva\12 COLOR 0,0, 
♦list Off fields RFEND, S, DESCRIPTOR 

*SET PRINT OFF 
CLOSE DATABASES 
ERASE TEKP1.DBF 
ERASE TOTP2.DBF 
USE TEMPDESIG 
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US^raJgS 1 ^ SUBROUTINE FOR ANALYSIS PROGRAMS 
COUNT TO TOT 

REHLACE ALL RFEND WITH 1 

MARK1 n 1 

SW2aO 

CO WHILE SW2=0 ROLL 
IF MARKL >» TOT 
PACK 

COUNT TO UNIQUE 
SW2=1 
LOOP 
ENDIF 
GO MARK1 

DUPa 1 

STORE EOTRY TO TESTA 
SW b 0 

DO WHILE SWsO TEST 
SKIP 

STORE ENTRY TO TSSTB 

IF TESTA « TESTB 

VEUEJTB 

CUP = DUP+1 

LOOP 

EKDIF 
GO MARIO. 

REPLACE RFEND WITH DUP 
MARK1 = MARKl+DUP 
6W=1 
LOOP 

ENDDO TEST 
LOOP 

ENDDO ROLL ^ - 
GO TOP 

STORE F TO DIST 

^J^&sis distribution, dbf 
MCATE FOR P=DIST 
REPLACE CLONES WITH TOT 
REPLACE GENES WITH UNIQUE 
USE TEMPI 

*ort on rfftnd/d to TEMP2 

USE TEMP2 

?? STR (UNIQUE, 5,0) 

?? 1 genes, for a total of ' 

?? SIR (TOT, 5,0) 

?? 1 clones * 

h ' _ V Coincidence 1 

list Off fields number, RFEND, L,D,F, Z, R,C, E23raY,S, DESCRIPTOR, LENGTW, INIT, I 
*SET PRINT OFF 
CLOSE DATABASES 
ERASE TEMPI. DBF 
.ERASE TEMF2.DBF 
USB TEWPDESIQ 
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* CC^RESSIQN SUBROUTINE FOR ANALYSIS PROGRAMS 

USB TEMPI * w 

COUNT TO TOT 

REPLACE ALL RFEND WITH 1 

MARK1 a 1 

SW2-0 

IX) WHILE SW2=0 ROLL 
IF MARK1 >- TOT 
PACK 

COUNT TO UNIQUE 

SW2-1 

LOOP 

ENDI? 
GO MARK1 
DUP « 1 

STORE ENTRY TO TESTA 
SW * 0 

DO WHILE SWsO TEST 
SKIP 

STORE ENTRY TO TESTB 

IF TESTA c TESTB 

DELETE 

DUP .= DUP+1 

LOOP 

ENDIF 
GO MARK1 

REPLACE -RFEND WITH EOT 
MARK1 e MARK1+DUP 
SW=1 
LOOP 

ENDDO TEST 
LOOP 

ENDDO ROLL ' 

GO TO? 

USB TEMPI 

?? STR (UNIQUE, 5,0) 

?7 ' genes, for a total of * 

?? 5TR(T0T,5/0) 

?? ' clones 1 

j.' ,^ V Coincidence' 

list Off fields number, RFEND, L, D, F, Z, R, C, EOTRY, S, DESCRIPTOR, LENGTH, INIT, I 
•SET PRINT OFF 
CLOSE DATABASES 
ERASE TEMPI. DBF 
USE TEMBDESIG 
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iw^EFSf* SUBROUTINE TOR ANALYSIS PROGRAMS 
USE TEMPI 

OTUOT TO IDQENE FOR D-'E 1 .OR.Ifa'O' .OR.D='H- .OR.D='N' OR D-'R' nn n-.», 



COUNT TO TOT 
REPLACE AIL RFEND WITH 1 
MARK! B 1 
SW2=0 

DO WHILE SW2=0 ROLL 
IP MARK1 >= TOT 
PACK 

COUNT TO UNIQUE 

5W2sl 

LOOP 

ENDIF 
GO MARXl 
DUP B 1 

STORE ENTRY TO TESTA 
5W » 0 

DO W HILE SW=0 TEST 
SKIP 

STORE ENTRY TO TESTB 

IP tfESTA a TESTS 

DELETE* 

DUP s Dtfp+1 

LOOP 

ENDIF 
GO MARK1 

REPLACE RFEND WITH DUP 
MARK1 a MARKl+DQp 
Stfel 
LOOP 

ENDDO TEST 
LOOP 

ENDDO ROLL 
♦BROWSE 

*SET PRINTER ON 

SORT ON Rra29D/D, NUMBER TO TEXP2 
USE TEMP2 

J^nraBS,™ RFEND/IEGENE * 10000 

H ' genes, for a total of • 
STR(TO?,5,0) 
' clones 



LI K°°i? Cide 55 e V v Clones/10000* 

set heading off 



CLOSE DATABASES 
ERASE TEMPI. DBF 
ERASE TEM?2 .DBF 
USE *SmartGuy:FoxBASEt/Mac:fo5c f iles: clones. dbf 
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^OOMHraSION SUBROUTINE FOR ANALYSIS PROGRAMS 

COOT TO HXSENB FOR D='3\0R.Ite'O\0R.D='H\OR.D='N' OR D-'R- os n=.». 

C0UOT TO TOT 

REPLACE ALL RFEND WITH 1 

MARK1 c 1" . 

SW2=0 

DO WHILE SW2»0 ROLL 
IP MARK1 TOT 
PACK 

COUNT TO UNIQUE 

SW2=1 

LOOP 

ENDIF 
GO MARX1 
DOPa 1 

STORE ENTRY TO TESTA 
SW « 0 

DO WHILE SW=0 TEST 
SKIP 

STORE ENTRY TO TESTB 

IP TESTA = TESTS 

DELETE 

DOT * DUP+1 

LOOP - 

KNDIF 
GO MARX1 

REPLACE RFEND WITH DUP 
MARK1 a MARX1+DUP 
SW=1 
LOOP 

ENDDO TEST 
LOOP 

ENDDO ROLL 
♦BROWSE 

*SET PRIOTER ON 

SORT ON RFEND/D, NUMBER TO TEMP2 
USE TEMP2 

REPLACE ALL START WITH RPEND/IDGENE* 10000 

?? Sra<UNIQUS,5,0> 

?? 1 genes, for a tdtal of • 

?? STR(TOT,5,0) 

?? • clones' 

1 L C°i? cidence v V Clones/20000' 

set heading off 

SCREEN 1 TYPE 0 HEADING "Screen 1- AT 40,2 SIZE 286,492 PIXELS JWT -Geneva- 7 cornn o n o 

*s£ ^ 5 0 ^ r '^' s ^ GeBeva ' 7 °' 0 ' 0 ' 

CLOSE DATA3ASES 
ERASE TEMPI .DBF 
ERASE TEMP2.DBF 

USB •SmartGuyjFoxBASE+/Macifox files i clones, dbf • 
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USB TEMPI 
COUNT TO TOT 

?? 1 Total of 1 

?? STR<T0T,4,0) 
?? 1 clones* 
7 

*list Off fields nuxriber,L,D,F # Z,R,C,nfTRY, DESCRIPTOR, LENGTH, RFEND.2NIT I 
list Off fields nun^r,L,D,F # 2,R,C,ENTRY / nESCRIPIDR 
CLOSE DATABASES 
ERASE. TEK?1. DBF 
USE TSMPDESIG 
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♦Lifescan menu; version 8-7-94 

SET TALK OFP 

set device to screen 

CLEAR 

USB "SmartGuy : FoxBASE+/Mac » fox f iles: clones, dbf 
STORE LUPDATEO TO Update 
GO BOTTOM 

STORE REGNO () TO cloneno 
STORE 6 TO Chooser 
DO WHILE .T. 

* Program, i Lifeseg menu.fmt 

* Date.... i 1/11/95 

* Version.: FoxEASE+/Mac # revision 1.10 

* Notes. . . . ; Format file Lifaseq menu 

fSi S!?>2 STEMS \l$ ^•SS^S&'TUS n ^SS^Sf ^ 

9 JOELS 110,29 TO 188,217 SttLS 3871 SK ofo. -i,?25600 -i '? ' 21 ' ~ 15 ' 25 

2 IS5S «* wTtol,- STYLE kit KHT "Geneva ■ ^aUGcVi l i i 

J KJBLS 45,296 Sly -V1.30- SKIS 65535 FCMI Hmmf .TK^Sui oToTS!. ' - ' 1 

• EOF: Lifeseq menu. fint 

HEAD 

DO CASE 

CASE Choo9er=l 

^ E S^2° XB ^ +/MaCl£0>C fil * s:0ut P ut Programs (Master analysis 3.prg- 
' ° X3ASE * /MaC: f °* £ileB:0ut P ut Programs Subtraction 2.prg« 

' (single) .prg" 

USS "Libraries, dbf 
BROWSE 

CASE Choosers 

^^6^o6° XE ^ E+/MaC: f ° X filQa:0ut P ut ProgramsiSee individual clone. prg' 

^toE^^^ 0 " file Sl Ubrari es ,Output programs: Menu, pry" 
CLEAR 

SCREEN 1 OFF 

RETURN 

ENDCASE 

LOOP 
EMDDO 
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01,30 SAY "Database Subset Analysis' STYLE €553$ FONT -Geneva", 274 COLOR 0,0,0, -1,-1, -1 



7 
7 

? date{) 
?? 1 1 
?? TIMS{) 

7 'Clone numbers 1 

?? STR (INITIATE, 6,0) 

?? 1 through • 

?? STR (TERMINATE ,6,0) 

7 'Libraries! 1 

IP ENTIRE =1 

? 'All libraries' 

ENDIF 

IF ENTIRE=2 
HARKal 
DO WHILE .T. 
IF MARK>STO?IT 
EXIT 
SIDIF 

USE SELECTED 
GO MARK 
7 • ' 

77 TRIM(libname) 
STORE MARK+l TO MARK 
LOOP 
ENDDO 

? 'Designations t • 

IF Ematch=0 .AND. Hmatch=0 .AND. Onatch=0 
?? 'All' 



IF Etaatch*l 
77 'Exact,' 
ENDIF 

IF Hmatch=l 
?? 'Human, 1 
ENDIF 

IF Cmatch=l 
77 'Other sp. 1 
ENDIF 

IF CCNDEN=1 

? 'Condensed format analysis' 

ENDIF 

IF ANALol 

?' 'Sorted by NUMBER* 

ENDIF 

IF ANAL=2 

? 'Sorted by ENTRY' 

ENDIF 

IF ANAL*3 

? 'Arranged by ABUNDANCE 1 

ENDIF 

IF ANAL=4 

? 'Sorted Jay INTEREST' 

ENDIF 

IF ANAL=5 

? 'Arranged by LOCATION' 

ENDIF 

IF ANAL- 5 

7 'Arranged by DISTRIBUTION" 

ENDIF 

IF ANAL»7 

7 'Arranged by FUNCTION 1 
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ENDIF 

? 'Total clones represented : 1 

?7 STR(STARTOT,6,0) 

? 'Total clones analyzedi 1 

?? STR(ANALTOT,6,0) 

? 

? 
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USE TEMPI 
COUNT TO TOT 
?? ' Total of* 
?? STR(IOr,4,0> 
?? • clones 1 
? 

lSf t «S5 f #f i ? au £ ber J L ' D ' F ' 2 ' *' C, ENTOY , DESCRIPTOR, LENGTH, RF2ND, TO, I 

ERASE TEMPI. DBF 
USE TEMPDESIG 



81 



WO 95/20681 



PCT/DS95/01160 



USE TEMPI 
COUNT TO TOT 
?? ' Total Of 1 
?? Sra<TOT,4,0) 
?? ' clones! 
? . 

list off fields nUWber, L, D, F, 2, R, C, EtTTRV, DESCRIPTOR 
CLOSE DATABASES 
HttSE TBi?l,DB? 
USE TSMPD5SIG 
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♦Northern (single), version 11-25-94 

close databases 

SET TALK OFF 

SET PRINT 0?? 

SET EXACT OFF 

CLEAR 

STORE ' ' TO Eobject 

SSSS L» , ' TO Dobject 

STORE 0 TO Numb 

STORE 0 TO Zog 

STORE 1 TO Bail 

DO WHILE .T. 

* Program.: Northern (single). fmt 

* Date....: 8/ 8/94 

* Version.: FoxBASE*/Mac, revision 1.10 

* Notes i Format file Korthern (single) 

f?Si ^JSSfc'Szr^oSai IT^tllo^l F0NT ■ Geaeva '' 12 

! SEE* 89 ' 79 TO 192 ' 422 ST^ 2 28 «7 COLOR 6,6,0.-25600 

I H!'?5, SAy ,Eatry #: " STaB 6S 536 TOOT "Geneva " , 12 COLOR 0,0,0 -1 -1 -1 

f f™9 115.173 GET Eobject smfi 0 FONT -Geneva',^ SIZ3 15,142 COLOR 0 6 0 -1 1 i 

I SSS^f lihl 13 QST Dob i ect BKls 0 ram- -Geneva '.12 SIZ3 15.241 COLO^ 0 6 0 -1 -1 -1 

» 555f ^A 8 ?^ " Single 8ea *<* screen- STHE6S536 ram^SSt 274 CO^R 0 0 - 

w r-AJu-s- say "Clone STYLE $5536 FOOT •Geneva , '-12 color o n n -i -i 1 

PIXELS 80,152 8W -Eater any ONE of the following:- STYLE 65™? Soot ■ Leva- ! U COLOR -1, 

* EOT; Northern (single). fmt 
READ 

IF Bail«2 
CLEAR 

scre en 1 off 
R3TORN • 
EM3IP 

USE • Smart Guy : FoxRASE* /Mac : Fox files: Lookup •dbf" 
SET TALK* ON 



IF Eobjecto' . ■ 

STORE UPPER (Eobject) to Eobject 

SETT SAFETY OFF 

SORT ON Entry TO "Lookup entry, dbf 

SET SAFETY ON 

USE -Lookup entry, dbf 

U3CATE FOR Look*Eobjeet 

IF .NOT. FOUND () 

CLEAR 

LOOP 

ENDIF 

BROWSE 

STORE Entry TO Searchval 

CLOSE DATABASES 

ERASE "Lookup •entry. dbf" 

ENDIF 



IF Dobjecto' ■ 
SET EXACT OFF 
SET SAFETY OFF 

SORT ON descriptor TO "Lookup descriptor, dbf * 
SET SAFETY On 

USB -Lookup descriptor. dbf ■ 

LOCATE FOR UPPER (TRIM (descriptor) ) =UPPER (TRIM { Dob j ect ) ) 

IF .NOT.FCUNDO 

CLEAR 
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LOOP 

ENDIF 

BROWSE 

STORE Entry TO Searchval 

CLOSE DATABASES 

ERASE "Lookup descriptor, dbf 

SET EXACT ON 

EUDIF 

IP NlEriboO 

USE i SroartGuy:PoxBASE+/Mac:Fox f ilesrc lanes. dbf 

GO Numb 

BROWSE 

STORE Entry TO Searchval 
ENDIF 

CLEAR 

? 'Northern analysis for entry 1 

?? Searchval 

? 

? 'Enter Y to proceed' 

WAIT TO OK 

CLEAR 

IP UPPER (OK) o'Y' 
screen 1 off 
RETURN 
ENDIF 

* COMPRESSION SUBROUTINE FOR Library, dbf 
? 'Compressing the Libraries file now. . . 1 

USE ■ SmartGuy : FoxBASE+ /Mac i Fox files: libraries. dbf 
SET SAFETY OFF 

SORT CM library TO "Coaipreseed libraries. dbf u 

* FOR entered>0 
SET SAFETY ON 

USE ■Conipressed libraries. dbf ■ 

DELETE FOR entered- 0 

PACK 

COUNT TO TOT* 
MARK1 n 1 
SW2uO 

DO WHILE SW2=0 ROLL 

IF MARK1 >= TOT 

PACK 

SW2=1 

LOOP 

END!? 
GO MARK1 

STORE library TO TESTA 
SKIP 

STORE Library TO TESTB 
IP TESTA = TESTB 
DELETE 
ENDIF 

MARK1 . MARK1+1 
LOOP 

ENDDO ROLL 

* Northern analysis 
CLEAR 

? 'Doing the northern now. . . ' 
SET TALK ON 

USE 'SmartGuy:FoxSASE+/Mae:Fox £ iles: clones. dbf 
SET SAFETY OFF 

COPY TO "Hit*. dbf FOR entry=searchval 
SET SAFETY ON 
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CLOSE DATABASES 
SELECT 1 

H^«I C ^? re3Sed libraries, dbf 

STORE FSCCCxmi) OO Entries 

SELECT 2 

USE ■Hits.dbf" 

Markal 

CO WHILE .T. 

SELECT 1 

IF MarJoEntries 

EXIT 

EKDZF 

GO MARK 

STORE library TO Jigger 
SELECT 2 

CgOT TO Zog TOR library^Jigger 

REPLACE hits with Zoa 

MarkaMark+l 

LOOP 

ENDDO ' 

SELECT 1 

gOW5E FIELDS LIBRARY, LIHNftME, ENTERED* HITS AT 0,0 

? 'Enter Y to print: 1 

WAIT TO PRINSET 

IF UPPER (PRINSET) o 1 Y 1 

SET PRINT ON 

CLEAR 

EJECT • 

SCREEN 1 TYPE 0 HEADING "Screen 1" at dn o JAA 

? 'DATABASE Ma™KlMOT * 4 °' 2 ^ 286 ' 492 HXELS «»* "Geneva-,14 COLOR 0,0,0 
?? Searchval 
? DATE O 



SELECT 2 

SI'S 0FP LDS ^ ra '"^ V '°' S ' ? ' 2 'R^V. reSCT I W 0 R ,^ S Tm <STWl T,RF E ND 

SET PRINT OFF 

ENDIF 

CLOSE DATABASES 
SET TALK OFF 
CLEAR 

Merest print .prg- 
RETURN 
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TABLE 6 



library 

ADENINB01 
ADRENOR01 
ADREN0TD1 
AMLBNOTD1 
BMARNOTD1 
BWARNOT02 
CARONOTD1 
CHAONOTO1 
COfiNNOTOl 
P3RAOT01 
FIBRAGT02 
RBRAMT01 
B2RNGT01 
FI9RNQT02 
FIBWOT01 
FiaRNOTO 
HMC1NOT01 
HUVEUPB01 
HUVENOB01 
HUVcSTBOl 
HYPONOB01 
KI0NNOT01 
UVRMOT01 
LUNGNOTTD1 
MUSCNOT01 
OV1DNOB01 
PANCNOT01 

fitumoroi 
prruNOToi 

PUCNOB01 
SiMTNOTD2 
SPLNFETD1 
SPLNNOTD2 
STOMNOT01 
6YNORAB01 
TBLVNOTD1 
7HS7NOT01 
THP1NOB01 
THPlPEBOt 
THP1PLB01 
U937NOT01 



libnamo 
Inflamed adenoid 
Adrenal gland (r) 
Adrenal gland (T) 
AML blast cells (T) 
Bone marrow 
Bone marrow (T) 
Cardiac muscle (T) 
Chin, hamster ovary 
Corneal stroma 
RbroWeat, ATS 
Fibroblast, AT 30 
Fibroblast. AT 
Fibroblast, uv 5 
Fibroblast, uv 30 
Fibroblast 
Fibroblast, normal 
Mast cell line HMC-1 
HUVEC IFNJNF.LPS 
HUVEC control 
HUVEC shear stress 
Hypothalamus 
Wdney CO 
Liver (D 
Lung fT) 

Skeletal muade (T) 
Oviduct 

Pancreas, normal 
Pituitary (r) 
Pituitary (J) 
Placenta 

Small intestine (T) 
8pleeni>liver, fetal 
Spleen (T) 
Stomach 
Rheum, synovium 
T + B rymphoblast 
Testi9 (7) 
THP-l control 
THP phorboi 
THP-1 phorboi LPS 
U937, monocytic leuk 



number library 

2304 U837NOT01 

3240 HMC1NOT01 

3269 HMC1NOT01 

4693 HMC1NOT01 

8989 HMC1NOT01 

9139 HMC1NOT01 



d s f 2 r entry 
E H C C T HUMEF1B 
E H C C T HUMEF1B 
E H C C T HUMEF1B 
E H C C T HUMEFlB 
EHCCT HUMEF1B 
E H C C T HUMEF1B 



descriptor 
Elongation lador 1-beta 
Elongation fador 1-beta 
Elongation factor 1-beta 
Elongation factor 1-beta 
Elongation tenor voeta 
Elongation factor 1-bete 



M atarieiert 

v- 0 

0 370 

0 371 

0 470 

0 327 

0 375 



Mend 
773 
773 
773 
773 
773 
773 
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WHAT IS CLAIMED IS! 

i. A method of analyzing a specimen, containing gene 
transcripts, said method comprising the steps of: 

(a) producing a library of biological sequences; 
5 (b) generating a set of transcript sequences, where 

each of the transcript sequences in said set is indicative 
of a different one of the biological sequences of the 
library; 

(c) processing the transcript sequences in a 

10 programmed computer in which a database of reference 

transcript sequences indicative of reference biological 
sequences is stored, to generate an identified sequence 
value for each of the transcript sequences, where each said 
identified sequence value is indicative of a sequence 

15 annotation and a degree of match between one of the 

transcript sequences and at least one of the reference 
transcript sequences; and 

(d) processing each said identified sequence value to 
generate final data values indicative of a number of times 

20 each identified sequence value is present in the library. 

2. The method of claim l, wherein step (a) includes 
the steps of: 

obtaining a mixture of mRNA; 

making cDNA copies of the mRNA; 
25 isolating a representative population of clones 

transfected with the cDNA and producing therefrom the 
library of biological sequences. 

3. The method of claim 1, wherein the biological 
sequences are cDNA sequences. 

10 4. The method of claim l, wherein the biological 

sequences are RNA sequences. 

5. The method of claim l, wherein the biological 
sequences are protein sequences. 
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6. The method of claim 1, ^wherein a first value of 
said degre-e' of "match is indicative of an exact match, and a 
second value of said degree of match is indicative of a 
non-exact match. 

5 7. A method of comparing two specimens containing 

gene transcripts, said method comprising: 

(a) analyzing a first specimen according to the 
method of claim 1; 

(b) producing a second library of biological 
10 sequences; 

(c) generating a second set of transcript sequences, 
where each of the transcript sequences in said second set' 
is indicative of a different one of the biological 
sequences of the second library; 

15 • (d) processing the second set of transcript sequences 

in said programmed computer to generate a second set of 
identified sequence values known as further identified 
sequence values, where each of the further identified 
sequence values is indicative of a sequence annotation and 
a degree of match between one of the biological sequences 
of the second library and at least one of the reference 
sequences; 

(e) processing each said further identified sequence 
value to generate further final data values indicative of a 
number of times each further identified sequence value is 
present in the second library; and 

(f) processing the final data values from the first 
specimen and the further identified sequence values from 
the second specimen to generate ratios of transcript 

30 sequences, each of said ratio values indicative of 

differences in numbers of gene transcripts between the two 
specimens . 

8. A method of quantifying relative abundance of mRNA 
in a biological specimen, said method comprising the stens 
35 of: 

(a) isolating a population of mRNA transcripts from 
the biological specimen; 



20 
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(b) identifying genes from which the mRNA was 
transcfriBea" by a sequence-specific method; 

(c) determining numbers of mRNA transcripts 
corresponding to each of the genes; and 

5 (d) using the mRNA transcript numbers to determine 

the relative abundance of mRNA transcripts within the 
population of mRNA transcripts. 

9. A diagnostic method which comprises producing a 
gene transcript image, said method comprising the steps of: 
10 (a) isolating a population of mRNA transcripts from a 

biological specimen; 

(b) identifying genes from which the mRNA was 
transcribed by a sequence-specific method; 

(c) determining numbers of mRNA transcripts 
corresponding to each of the genes; and 

(d) using the mRNA transcript numbers to determine 
the relative abundance of mRNA transcripts within the 
population of mRNA transcripts, where data determining the 
relative abundance values of mRNA transcripts is the gene 
transcript image of the biological specimen. 



15 



20 



10. 



25 



The method of claim 9, further comprising: 

(e) providing a set of standard normal and diseased 
gene transcript images; and 

(f) comparing the gene transcript image of the 
biological specimen with the gene transcript images of step 
(e) to identify at least one of the standard gene 
transcript images which most closely approximate the gene 
transcript image of the biological specimen. 

11. The method of claim 9, wherein the biological 
30 specimen is biopsy tissue, sputum, blood or urine. 

12. A method of producing a gene transcript image, 
said method comprising the steps of 

(a) obtaining a mixture of mRNA; 

(b) making cDNA copies of the mRNA; 
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a suitable vector and 
using said vector to transfect suitable host strain cells 
which are plated out and permitted to grow into clones, 
each clone representing a unique mRNA; 
5 (d) isolating a representative population of 

recombinant clones; 

(e) identifying amplified cDNAs from each clone in 
the population by a sequence-specific method which 
identifies gene from which the unique mRNA was transcribed; 
10 (f) determining a number of times each gene is 

represented within the population of clones as an 
indication of relative abundance; and 

(9) listing the genes and their relative abundance in 
order of abundance, thereby producing the gene transcript 
15 image. * 



13. The method of claim 12, also including the step 
of diagnosing disease by: 

repeating steps (a) through (g) on biological 
specimens from random sample of normal and diseased humans 
20 encompassing a variety of diseases, to produce reference ' 
sets of normal and diseased gene transcript images; 

obtaining a test specimen from a human, and producing 
a test gene transcript image by performing steps (a) 
through (g) on said test specimen; 
25 comparing the test gene transcript image with the 

reference sets of gene transcript images; and 

identifying at least one of the reference gene 
transcript images which most closely approximates the test 
gene transcript image. 

30 14. A computer system for analyzing a library of 

biological sequences, said system including: 

means for receiving a set of transcript sequences 

where each of the transcript sequences is indicative of a 
35 !nd fSrent ^ ^ bi ° l0giCal '■•*»««•. of the library; 

means for processing the transcript sequences in the 
computer system in which a database of reference transcript 
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sequences indicative of reference biological sequences is 
stored, wherein the computer is programmed^ with software 
for generating an identified sequence value for each of the. 
transcript sequences, where each said identified sequence 
5 value is indicative of a sequence annotation and a degree 
of match between a different one of the biological 
sequences of the library and at least one of the reference 
transcript sequences, and for processing each said 
identified sequence value to generate final data values 
10 indicative of a number of times each identified sequence 
value is present in the library. 

15. The system of claim 14, also including: 

library generation means for producing the library of 

biological sequences and generating said set of transcript 
15 sequences from said library. 

16. The system of claim 15, wherein the library 
generation means includes: 

means for obtaining a mixture of mRNA; 

means for making cDNA copies of the mRNA; 
20 means for inserting the cDNA copies into cells and 

permitting the cells to grow into clones; 

means for isolating a representative population of the 
clones and producing therefrom the library of biological 
sequences. 
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gtoqacttgaatgccccgacatcttcqactgt- 

GCGGTATTTCACAOCG-3') were used to arnpflfy 
the LB43 sequence of pRS3l6. and the raactton 
product was UmukjnmJ Ho yeast tor one-step gene 
repbcaniai |R fiotfBtdn, Methods fisymot 1M. 
281 (I991fl. To or«atv1he«tf7fl.-.-L£U?rTuatton con- 
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To construct the *f na SA "t£b g aJete (a rt e tauj i ccr- 
re sp ondng to 931 amno adds} carried on P153, a 
LBJ2 fragment was used to raptace tha 23-W> Prrt 
I-Ecfl36 1 fragment of S7E23, which occurs within a 
&24tb Hnd R-Bgl B genomic fragment canted on 
pSP72 (Prernege). To create YEprWvAf, a 1.5-kb 
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Phngte, I. Herskowttz. Cel 65, 1213 (1991); S. 
Powers, £ Gonzales. T. Chistensen, J. Cuban. O. 
Broek, /bid., p. 1225; H. O. Park. J. Chant. I. Her- 
skowte. Mature 365. 269 (1993); J. Chant. Trends 
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"Vatiwes- included the blowing strains: Y175 
Hxni-LaJZ), Y223 (arffi-UW). Y234 («e23Ar 
LBJ2X and Y272 ^x/W:LBU2 sta23teL£UZL 
MATo certvalivas of EG123 hduded the fotowwig 
strains: Y214 (EG123 made M47o) and Y293 
i**n&rt£U2l AS strains were generated by means 
of standard genetic or mcaaosar methods InvoKing 

ste23 double mutant strains ware oaetad by croaa. 
tig of the appropriate MA 7a sfa23 and MATa ax/r 
mutants, toaowed by sporutatton of the resutant (fp. 
bid and isolation cf the double mutant from nonpa- 
rental di-typa tetrads. Gene dtarupttons were con* 
ffrmed with either PCR or Southern (DMA) analysis. 
31. pl29eaYEp352p.£H4A WMyerj.T.J.Ko- 
emar, A Tzagotoff. Yeast Z 163 (1986)) ptasmb con- 
tartnga5^(bSallfragrnaTtofp«Ll.pt5l was 
derfved from pl29 by rsarton of « Inker at the Bgl I 
site withh AXL r, which led to an Irvfrarne haenionof 
trwrarnaggUWn (HA) epitope (DCTrPYOVPDYA) (29) 
between amfrio acids 854 arid 655 of the Att.f prod> 



uct tC225 b a KS+ (Straagene) ptasrrtd ccrtahno 
tubon mutabons of the proposed actnn ska of Aadip 

Sr^S^S^- S'-S^CACAAAGCGCT. 
GCCAAACCGGC-3'; vdJ>E7lA, S'-AAGAATCAT- 

AAGAATCATGTGATCACAAAGGTGCGC-3') 'The 
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from the mutaoenizad dC225 r^Mmi^e on the manuscript Supported bv a 
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Quantitative Monitoring of Gene Expression 
Patterns with a Complementary DNA Microarray 

Mark Schena,* Dari Shalon.'t Ronald W. Davis 
Patrick O. Brown* ' 

i^^^^^^^^^ of many genes ta 
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The reinporal, developmental, topographi- 
cal, histological, and physiological patterns 
in which a gene is expressed provide ciu« to 
its biological role. The large and expanding 
database of complementary DNA (cDNA) 
sequences from many organisms (J) presents 
the opportunity of defining these patterns at 
the level of the whole genome. 

For these studies, we used the small flow- 
enng plant AroiadopsLs tholiona as a model 
organism. Arabidopsis possesses many ad- 
vantages for gene expression analysis, in- 
cluding the fact that it has the smallest 
genome of any- higher eukaryote examined 
^rZT £\ t Fort V- fivc cloned Arabidopsis 
cDNAs (Tab e 1), including 14 complete 
sequences and 31 expressed sequence tags 
(tSTs), were used as gene-specific targets. 
We obtained the ESTs by selecting cDNA 

nSi i*l random from ^ Arabidopsis 
l "hrary. Sequence analysis revealed 
that 28 of the 31 ESTs matched sequences 
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iT L d ! Qbase (Table 1). Three additional 
cUNAs from other organisms served as con- 
trol* in the experiments. 

The 48 cDNAs, averaging -1.0 kb 
were ampimed with the polymerase chain 
reaction (PCR) and deposited into indi- 
vidual wells of a 96-well microtiter plate. 
Each sample was duplicated in two adja- 
cent wells to allow the reproducibility of 
the arraying and hybridization process to 
be tested. Samples from the microtiter 
plate were printed onto glass microscope 
slides in an area measuring 3.5 mm by 5 5 
mm with the use of a high-speed arraying 
machine (3). The arrays were processed by 
chemical and heat treatment to attach the 
DNA sequences to the glass surface and 
denature them (3). Three arrays, printed 
m a single lot. were used for the experi- 
d^S" hc / e * A single microtiter plate of 
PCR products provides sufficient material 
to print at least 500 arrays. 

Fluorescent probes were prepared from 
total Arooidopsis mRNA (4) by a single 
round of reverse transcription (5). The Ara- 
tndopsu mRNA was supplemented with hu- 
man acetylcholine receptor (AChR) mRNA 
at a dilution of 1 : 1 0.000 (w/w) before cDNA 
synthesis, to provide an internal standard for 
calibration (5). Tfce resulting fluorescently 
labeled cDNA mixture was hybridized to an 
array at high stringency <6) and scanned 

467 



with a laser (3). A high-scnsiriviry scan gave 
signab that saturated the detector at nearly 
all of the Aratilopsis target sites (Fig. 1A). 
Calibration relative to the ACfcR mRMA 
standard (Fig. 1A) established a sensitivity 
limit of -1:50,000. No detectable hybrida- 
tion was observed to either the rat glucocor- 
ticoid receptor (Fig. IA) or the yeast TBP4 
(Fig. 1A) targets even at the highest scan- 
ning sensitivity. A moderate-sensitivity scan 



of the same array allowed linear detection of 
the more abundant transcripts (Rg. IB) 
Quantitation of both scans revealed a range 
of expression levels spanning three orders of 
magnitude for the 45 genes tested (Table 2). 
RNAbloo (7) for several genes (Fig. 2) 
corroborated the expression levels measured 

Se2) miCr0amy l ° WiChin 3 feci0f rf 5 
Differential gene expression was invesri- 
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gated with a simultaneous, two-color hv 
bnduatioo scheme, which served 
mue experimental variation inherent in the 
comparison of independent hrbridiario™. 
Fko«cem probe, were prepared from two 
mRNA sources with the use of reverse tran- 
scriptase in the presence of fluorescein- and 
ussamine-labeled nucleotide analow re. 
specdvely (J). The two probes^'d^ 

SSSh' 0 ^ ! n C£lUal Prions, hy- . 
bridued to a single array, and scanned *cp- 

^ n K iL for " UMe **n and Uoamine ern£ 
sion after independent excitation of the two 
fluorophores (3). 

To test whether overexpression of a sin- 
gle gene could be detected in a pool of total 
Arc^idopju mRNA, we used a ntfcroarrayto • 
analyse a transgenic line overexpressins the 
single transcription factor HA7V («). Fluo- 
rescent probes representing mRNA from 
ST W^™^^ piano were 
labeled with fluorescein and lissamine. re- 

'SS ^lS.*! probe * w th « mbced 
and hybridued to a single array. An intense 
hybridisation signal was observed at the 
position of the HAT4 cDNA in the liasa? 
mine-specific scan (Fig. ID), but not in the 
fluorescein^pecific scan of the same array 
5fli " C ^ libration w «* AChR mRNA 
!rSSl W * e . fluores «» "«J lissamine 
? .V?^?^ 6 * 11 reactions at dilutions of 
1:10,000 (Fig. 1C) and 1:100 (Rg. ID) 
respectively revealed a 50-fold elevation of 
MAT4 mRNA in the transgenic line rels- 

(Table 2). This magnitude of HAT4 over- 
expression matched that inferred from the 
Northern (RNA) analysis within a factor of 
2 (Fig. 2 and Table 2). Expression of all the 
other genes monitored on the array differed 
by less than a factor of 5 berw een HAT4- 
transgenic and wild-type plana (Rg 1 c 
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mRNACig) 
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20 0.2 
mRNA (nfi) 

Fig. 2. Gene expression monitored with RNA 

^ZTll 0 ' ana,ySiS - V**™* amounts of 
mRMA from wild-type and HA^.transoenic 
plants were spotted onto nyton merribrar^aTand 
probed with the cONAs indicated. Purified human 
AChR mRNA was used tor calibration. 
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and D, and Table 2). Hybridization of flu- 
orcsccin-labcled glucocorticoid receptor 
cDNA (Fig. 1C) and lissamine.Iabeled 
TRP4 cDNA (Fig. ID) verified the pre*- 
ence of the negative control targets and the 
lack of optical cross talk between the two 
fluorophores. 

To explore a more complex alteration in 
expression patterns, we performed a second 
two-color hybridisation experiment with 
fluorescein- and lissamine- labeled probes 
prepared from root and leaf mRNA, respec. 
tively. The scanning sensitivities for the 
two fluorophores were normalized by 
matching the signals resulting from AChR 



mRNA, which was added to both cDNA 
synthesis reactions at a dilution of 1:1000 
(Fig. 1 , E and F). A comparison of the scans 
revealed widespread differences in gene ex. 
prcssion between root and leaf tissue (Fie 1 

laced CAB/ gene wa, -500-fold more abw>- 

(fig. IE) The expresion of 26 other genes 
differed between root and leaf tissue by 

iheHAT^transgenic line we examined 
has elongated hypocoryb, early flowering. 

Sn r S? ta, ? 0l t altered P'fmentation 
{8h A| though changes in expression were 



h trtsaucly matched a sequence n the ^S^iSSTSSV ^ three of the ESTs used 
d^cleottte; ATPase, adenosine t^p^^^.^^g^^' nicotinarnide adenine 



Position 



cONA 



Function 



■1.2 

a3,4 

a5.6 

a7,8 

a9. 10 

a11, 12 

b1,2 

b3,4 

D5.6 

b7.8 

bd.10 

bii, 12 

C1.2 
C3,4 
C5.6 
c7,8 
C9, 10 
C11.12 
CJ1.2 
d3,4 
d5.6 
d7.8 
<39, 10 
12 



d11 
el, 2 

e3,4 
e5,6 
e7.8 
e9, 10 
ell. 12 
f1,2 
13.4 
f5, 6 
17.8 
f9. 10 
111.12 
9t2 
Q3.4 
95*6 
97.8 
99.10 
911. 12 
hl,2 
h3,4 
h5,6 
h7.8 
h9.10 
h11.12 



AChR 
EST3 
EST6 
AAC1 
EST 12 
EST13 
CABl 
EST17 
GA4 
EST19 
GBf-1 
EST23 
EST29 
GBF-2 
EST34 
EST35 
EST41 
rGR 
EST42 
EST45 
H477 
EST46 
EST49 
HAT2 
HAT 4 
EST50 
H475 
ES751 
HAT22 
EST52 
EST59 
HNAT1 
E5T60 
EST69 
PPH1 
EST70 
EST75 
EST76 
AOC7 
EST82 
ESTB3 
EST84 
EST91 
EST96 
SARI 
EST100 
EST103 
TRP4 



Human AChR 
Actio 

NADH oehydnooenase 
Actin 1 
Unknown 
Actin 

Cntorophyll a/b Wndhg 
^xasphogiycerate kinase 
Gbberallic acid biosynthesis 
Untaown 

G-box binding factor 1 
Bongation factor 
Aldolase 

G-box binding factor 2 
Chloropiasi protease 
Unknown 
Catalase 

Rat glucocorticoid receptor 
Unknown 
ATPase 

Homeobox-leucine zipper 1 
U9ht harvesting compter 
Unknown 

Homeobca-teucine zpper 2 
H< * T >robo*-reucine zpper 4 
PnosprxxibuloJdnase 
Horoeobox-teucine zpper 5 
Unknown 

Homeobox -leucine zpper 22 
Oxygen evolving 
Uiknown 

ttxxteoMte horneobox 1 
RuSisCO sma» subunit 
Translation elongation factor 
Protein phosphatase 1 
Unknown 

Chtoropiast protease 
Uiknown 
Cyctaphfln 
GTP binding 
Uiknown 
Uiknown 
Uiknown 
Uiknown 
Synaptobrevin 
Light harvesting complex 
Ught harvesting complex 
Yeast tryptophan biosynthesis 
». California). TNo match h the daabasa; novel EST 



observed for HAT4, large changes in « 

other 44 genes we examined Tt^ ^ 
somewhat surprising particularly because 
»»Mvc ; nab* of leaf and root tiC 
.denied 27 differentially expressed 5? 
Anaiysu of an expanded set of genes rnWbc* 
requtred to identify genes whcT« P S 
changes upon HAT4 overexpressio£ al«r 
natively, a comparison of mRNA popZ 
turns from specific tissues of wild-t^rpe and 

AtdiBCBMi density of robotic printing, 
»cale up the fabrication pot 
!r^i? ana ** containing 20,000 

U ilTO At * U **£lc array 
would be sufficient to provide genetirxSfic 
orgeis encompassing nearly dte entire rer> 

Crtotrc °|«P~ed genes in the AraWop* 

Accession % ^ av *W>to<Y of 20,274ESTs 

number <- 

xoopsis (i, 9) would provide a rich 

source of templates for such studies 

The estimated 100,000 genes in the hu- 
man genome (10) exceeds the number of 
Arabidopsu gene, by a factor of 5 (2). This 
modest increase in complexity suggests that 
.nn.hr cDNA microarrays, prepfred fc£ 

ESTs (/), could be used to determine ch e 
expression patterns of tens of thousands of 
human genes in diverse cell types. Coupling 
an amplification strategy to the reverse 
transcription reaction (J J) could make it 
feasible to monitor expression even in 
minute tissue samples. A wide variety of 
acute and chronic physiological and patho- 
logical conditions might lead to character- 
istic changes in the patterns of gene expres- 
sion in penpheral blood cells or other easily 
sampled tissues. In concert with cDNA mi' 
croarrays for monitoring complex expres- 
sion patterns, these tissues might therefore 
serve as sensitive in vivo sensors for clinical 
diagnosis. Microarrays of cDNAs could thus 
provide a useful link between human Rene 
sequences and clinical medicine. 

T^!^^ ******** rnoritoring by rrtcroar- 

analyses; tg, 
See Table 1 for additional gene HorniZrV£* 

or the rnicroarray were detemwed from microa^ 
E* 5 ™^- Rvalues for the RNA rJoTweTe 
determined trom RNA otots (Rg. 2). 6 
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ADA Immunodeficient Patients 
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Severe combined immunodeficiency asso- 
ciated with inherited deficiency of ADA 
(i) is usually fetal unless affected children 
are kept in protective isolation or the im- 
mune system is reconstituted by bone mar- 
row transplantation from a human leuko- 
cyte antigen (HLAHdenucal sibling donor 
Uh This is the therapy of choice, although 
it is available only for a minority of patients. 
In recent yean, other forms of therapy have 
been developed, including transplants from 
haploidentical donors (3 ( 4) t exogenous en- 
ryme replacement (5), and somatic-cell 
gene therapy (6-9). 

We previously reported a preclinical mod- 
el in which ADA gene transfer and expression 
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successfully restored immune functions in hu- 
man ADA-dcficient (ADA") peripheral 
blood lymphocytes (PBLs) in immunodefi- 
cient mice in vivo (10 f 11). On the basis of 
these preclinical results, the clinical appltca- 
tion of gene therapy for the treatment of 
ADA" SOD (severe combined immunodefi- 
ciency disease) patients who previously failed 
exogenous enzyme replacement therapy was 
approved by our Institutional Ethical Com- 
mittees and by the Italian National Commit- 
tee for Bioethics (12). In addition to evaluat- 
ing the safety and efficacy of the gene therapy 
procedure, the aim of the study was to define 
the relative role of PBLs and hematopoietic 
stem cells in the long-term teconsrituuon of 
immune functions after retroviral vector-me- 
diated ADA gene transfer. For this purpose, 
two structurally identical vectors expressing 
the human ADA complementary DNA 
(cDNA), distinguishable by the presence of 
alternative restriction sites in a nonJunctional 
region of the viral long-terminal repeat 
(LTR), were used to transduce PBLs and bone 
marrow (BM) cells independently. This pro- 
cedure allowed identification of the origin of 
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METHOD MP w^T pg TOR yu mTCTmymra 

Field of the invention 

5 This invention relates to a method and apparatus 

for fabricating microarrays of biological samples for 
large scale screening assays, such as arrays of DNA 
samples to be used in DNA hybridization assays for 
genetic research and diagnostic applications. 

10 
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Background of mv«nfc*« n 

A variety of methods are currently available for 
making arrays of biological macroaolecules, such as 
10 arrays of nucleic acid molecules or proteins, one 
method for making ordered arrays of DMA on a porous 
membrane is a "dot blot" approach. In this method, a 
vacuum manifold transfers a plurality, e.g., 96 , 
aqueous samples of DNA from 3 millimeter diameter wells 
15 to a porous membrane. A common variant of this 

procedure is a "slot-blot- method in which the wells 
have highly-elongated oval shapes. 

The DNA is immobilized on the porous membrane by 
baking the membrane or exposing it to DV radiation. 
This is a manual procedure practical for making one 
array at a time and usually limited to 96 samples per 
array. "Dot-blot" procedures are therefore inadequate 
for applications in which many thousand samples must be 
determined. 

A more efficient technique employed for making 
ordered arrays of genomic fragments uses an array of 
pins dipped into the wells, e.g., the 96 wells of a 
microtitre plate, for transferring an array of samples 
to a substrate, such as a porous membrane, one array 
includes pins that are designed to spot a membrane in a 
staggered fashion, for creating an array of 9216 spots 
in a 22 x 22 cm area (Lehrach, et al., 1990) . a 
limitation with this approach is that the volume of DNA 
spotted in each pixel of each array is highly variable. 



20 



25 



30 



WO 95/35505 



PCIYUS95/07659 



In addition, the number of arrays that can be made with 
each dipping is usually quite small. 

An alternate method of creating ordered arrays of 
nucleic acid sequences is described by Pirrung, et al. 
5 (1992), and also by Fodor, et al. (1991). The method 
involves synthesizing different nucleic acid sequences 
at different discrete regions of a support. This 
method employs elaborate synthetic schemes, and is 
generally limited to relatively short nucleic acid 

10 sample, e.g., less than 20 bases. A related method has 
been described by Southern, et al. (1992). 

Khrapko, et al. (1991) describes a method of 
making an oligonucleotide matrix by spotting DNA onto a 
thin layer of polyacrylamide. The spotting is done 

15 manually with a micropipette. 

None of the methods or devices described in the 
prior art are designed for mass fabrication o'f 
microarrays characterized by (i) a large number of 
micro-sized assay regions separated by a distance of 

20 50-200 microns or less, and (ii) a well-defined amount, 
typically in the picomole range, of analyte associated 
with each region of the array. 

Furthermore, current technology is directed at 
performing such assays one at a time to a single array 

25 of DNA molecules. For example, the most common method 
for performing DNA hybridizations to arrays spotted 
onto porous membrane involves sealing the membrane in a 
plastic bag (Maniatas, et al., 1989) or a rotating 
glass cylinder (Robbins Scientific) with the labeled 

30 hybridization probe inside the sealed chamber. For 
arrays made on non-porous surfaces, such as a 
microscope slide, each array is incubated with the 
labeled hybridization probe sealed under a coverslip. 
These techniques requir a separate sealed chamber f r 
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each array which makes the screening and handling of 
many such arrays inconvenient and tine intensive. 

Abouzied, et al. (1994) describes a method of 
printing horizontal lines of antibodies on a 
5 nitrocellulose membrane and separating regions of the 
membrane with vertical stripes of a hydrophobic 
material. Each vertical stripe is then reacted with a 
different antigen and the reaction between the 
immobilized antibody and an antigen is detected using a 
10 standard ELISA colorimetric technique. Abouzied's 
technique makes it possible to screen many one- 
dimensional arrays simultaneously on a single sheet of 
nitrocellulose. Abouzied makes the nitrocellulose 
somewhat hydrophobic using a line drawn with PAP Pen 
15 (Research Products International) . However Abouzied 
does not describe a technology that is capable of 
completely sealing the pores of the nitrocellulose. The 
pores of the nitrocellulose are still physically open 
and so the assay reagents can leak through the 
20 hydrophobic barrier during extended high temperature 
incubations or in the presence of detergents which 
makes the Abouzied technique unacceptable for DNA 
hybridization assays. 

Porous membranes with printed patterns of 
25 hydrophilic/hydrophobic regions exist for applications 
such as ordered arrays of bacteria colonies. QA Life 
Sciences (San Diego CA) makes such a membrane with a 
grid pattern printed on it. However, this membrane has 
the same disadvantage as the Abouzied technique since 
30 reagents can still flow between the gridded arrays 
making them unusable for separate DNA hybridization 
assays . 

Pall Corporation make a 96-well plate with a 
porous filter heat sealed to the bottom of the plate. 
35 These plates are capable of containing different 
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reagents in each well with ut cross-contaminati n. 
However, each well is intended to hold only one target 
element whereas the invention described here makes a 
microarray of many biomolecules in each subdivided 
5 region of the solid support. Furthermore, the 96 well 
plates are at least l cm thick and prevent the use of 
the device for many color imetric, fluorescent and 
radioactive detection formats which require that the 
membrane lie flat against the detection surface. The 

10 invention described here requires no further processing 
after the assay step since the barriers elements are 
shallow and do not interfere with the detection step 
thereby greatly increasing convenience. 

Hyseq Corporation has described a method of making 

15 an u array of arrays" on a non-porous solid support for 
use with their sequencing by hybridization technique. 
The method described by Hyseq involves modifying the 
chemistry of the solid support material to form a 
hydrophobic grid pattern where each subdivided region 

20 contains a microarray of biomolecules. Hyseq 's flat 
hydrophobic pattern does not make use of physical 
blocking as an additional means of preventing cross 
contamination • 

25 Piiittw^y of the Invention 

The invention includes, in one aspect, a method of 
forming a microarray of analyte-assay regions on a 
solid support, where each region in the array has a 
known amount of a selected, analyte-specif ic reagent. 

30 The method involves first loading a solution of a 
selected analyte-specif ic reagent in a reagent- 
dispensing device having an elongate capillary channel 
(i) formed by spaced-apart, coextensive elongate 
members, (ii) adapted to hold a quantity of the reagent 

35 solution and (iii) having a tip region at which aqueous 
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soluti n in the channel forms a meniscus. The channel 
is preferably formed by a pair of spaced-apart tapered 
elements. 

The tip of the dispensing device is tapped against 
5 a solid support at a defined position on the support 

surface with an impulse effective to break the meniscus 
in the capillary channel deposit a selected volume of 
solution on the surface, preferably a selected volume 
in the range 0.01 to 100 nl. The two steps are 
10 repeated until the desired array is formed. 

The method may be practiced in forming a plurality 
of such arrays, where the solution-depositing step is 
are applied to a selected position on each of a 
plurality of solid supports at each repeat cycle. 
15 The dispensing device may be loaded with a new 

solution, by the steps of (i) dipping the capillary 
channel of the device in a wash solution, (ii) removing 
wash solution drawn into the capillary channel, and 
(iii) dipping the capillary channel into the new 
20 reagent solution. 

Also included in the invention is an automated 
apparatus for forming a microarray of analyte-assay 
regions on a plurality of solid supports, where each 
region in the array has a known amount of a selected, 
25 analyte-specific reagent. The apparatus has a holder 
for holding, at known positions, a plurality of planar 
supports, and a reagent dispensing device of the type 
described above. 

The apparatus further includes positioning 
30 structure for positioning the dispensing device at a 
selected array position with respect to a support in 
said holder, and dispensing structure for moving the 
dispensing device into tapping engagement against a 
support with a selected impulse effective to deposit a 
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s lected volume n the support, e.g., a selected volume 
in the volume range 0.01 to 100 nl. 

The positioning and dispensing structures are 
controlled by a control unit in the apparatus. The 
5 unit operates to (i) place the dispensing device at a 
loading station, (ii) move the capillary channel in the 
device into a selected reagent at the loading station, 
to load the dispensing device with the reagent, and 
(iii) dispense the reagent at a defined array position 

10 on each of the supports on said holder. The unit may 
further operate, at the end of a dispensing cycle, to 
wash the dispensing device by (i) placing the 
dispensing device at a washing station, (ii) moving the 
capillary channel in the device into a wash fluid, to 

15 load the dispensing device with the fluid, and (iii) 
remove the wash fluid prior to loading the dispensing 
device with a fresh selected reagent. 

The dispensing device in the apparatus may be one 
of a plurality of such devices which are carried on the 

20 arm for dispensing different analyte assay reagents at 
selected spaced array positions. 

In another aspect, the invention includes a 
substrate with a surface having a microarray of at 
least 10 3 distinct polynucleotide or polypeptide 

25 biopolymers in a surface area of less than about 1 cm 3 . 
Each distinct biopolymer (i) is disposed at a separate, 
defined position in said array, (ii) has a length of at 
least 50 subunits, and (iii) is present in a defined 
amount between about 0.1 femtomoles and 100 nanomoles. 

30 In one embodiment, the surface is glass slide 

surface coated with a polycationic polymer, such as 
poly lysine, and the biopolymers are polynucleotides. 
In another embodiment, the substrate has a water- 
impermeable backing, a water-permeable film formed on 
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the backing, and a grid formed on the film. The grid 
is composed of intersecting water- impervious grid 
elements extending from said backing to positions 
raised above the surface of said film, and partitions 
the film into a plurality of water- impervious cells, 
biopolymer array is formed within each well. 

More generally, there is provided a substrate for 
use in detecting binding of labeled polynucleotides to 
one or more of a plurality different-sequence, 
immobilized polynucleotides. The substrate includes, 
in one aspect, a glass support, a coating of a 
polycationic polymer, such as polylysine, on said 
surface of the support, and an array of distinct 
polynucleotides electrostatically bound non-covalently 
to said coating, where each distinct biopolymer is 
disposed at a separate, defined position in a surface 
array of polynucleotides. 

In another aspect, the substrate includes a water- 
impermeable backing, a water-permeable film formed on 
the backing, and a grid formed on the film, where the 
grid is composed of intersecting water-impervious grid 
elements extending from the backing to positions raised 
above the surface of the film, forming a plurality of 
cells, a biopolymer array is formed within each cell. 

Also forming part of the invention is a method of 
detecting differential expression of each of a 
plurality of genes in a first cell type, with respect 
to expression of the same genes in a second cell type. 
In practicing the method, there is first produced 
30 fluorescent-labeled cDNA's from mRNA's isolated from 
the two cells types, where the cDNA's from the first 
and second cells are labeled with first and second 
different fluorescent reporters. 

A mixture of the labeled cDNA's from the two cell 
35 types is added to an array of polynucleotides 
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representing a plurality of known genes derived from 
the two cell types, under conditions that result in 
hybridization of the cDNA's to complementary-sequence 
polynucleotides in the array. The array is then 
examined by fluorescence under fluorescence excitation 
conditions in which (i) polynucleotides in the array 
that are hybridized predominantly to cDNA's derived 
from one of the first and second cell types give a 
distinct first or second fluorescence emission color, 
respectively, and (ii) polynucleotides in the array 
that are hybridized to substantially equal numbers of 
cDNA's derived from the first and second cell types 
give a distinct combined fluorescence emission color, 
respectively. The relative expression of known genes 
in the two cell types can then be determined by the 
observed fluorescence emission color of each spot. 

These and other objects and features of the 
invention will become more fully apparent when the 
following detailed description of the invention is read 
in conjunction with the accompanying figures. 



Brief De scription of the Dravincra 
Fig. 1 is a side view of a reagent-dispensing 
device having a open-capillary dispensing head 
25 constructed for use in one embodiment of the invention; 

Figs. 2A-2C illustrate steps in the delivery of a 
fixed-volume bead on a hydrophobic surface employing 
the dispensing head from Fig. 1, in accordance with one 
embodiment of the method of the invention; 
30 Fig. 3 shows a portion of a two-dimensional array 

of analyte-assay regions constructed according to the 
method of the invention; 

Fig. 4 is a planar view showing components of an 
automated apparatus for forming arrays in accordance 
35 with the invent i n. 
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Pig. 5 shows a fluorescent image of an actual 20 x 
20 array of 400 f luorescently-labeled DKA samples 
immobilized on a poly-l-lysine coated slide, where the 
total area covered by the 400 element array is 16 
5 square millimeters; 

Fig. 6 is a fluorescent image of a 1.8 cmx 1.8 cm 
microarray containing lambda clones with yeast inserts, 
the fluorescent signal arising from the hybridization 
to the array with approximately half the yeast genome 
10 labeled with a green f luorophore and the other half 
with a red f luorophore; 

Pig. 7 shows the translation of the hybridization 
image of Pig. 6 into a karyotype of the yeast genome, 
where the elements of Pig. -6 microarray contain yeast 
DNA sequences that have been previously physically 
mapped in the yeast genome; 

Pig. 8 show a fluorescent image of a 0.5 cm x 0.5 
cm microarray of 24 cDNA clones, where the microarray 
was hybridized simultaneously with total cDNA from wild 
type Arabidopsis plant labeled with a green f luorophore 
and total cDNA from a transgenic Arabidopsis plant 
labeled with a red f luorophore, and the arrow points to 
the cDNA clone representing the gene introduced into 
the transgenic Arabidopsis plant; 
25 Pig. 9 shows a plan view of substrate having an 

array of cells formed by barrier elements in the form 
of a grid; 

Pig. 10 shows an enlarged plan view of one of the 
cells in the substrate in Fig. 9, showing an array of 
30 polynucleotide regions in the cell; 

Fig. 11 is an enlarged sectional view of the 
substrate in Fig. 9, taken along a section line in that 
figure; and 

Fig. 12 is a scanned image of a 3 cm x 3 cm 
35 nitrocellul se solid support containing four identical 
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arrays of M13 clones in each of four quadrants, where 
each quadrant was hybridized simultaneously to a 
different oligonucleotide using an open face 
hybridization method. 

5 

Detailed Description of the Invention 

I. Definitions 

Unless indicated otherwise f the terms defined 
below have the following meanings: 

10 "Ligand" refers to one member of a ligand/ ant i- 

ligand binding pair. The ligand may be, for example, 
one of the nucleic acid strands in a complementary, 
hybridized nucleic acid duplex binding pair; an 
effector molecule in an effector /receptor binding pair; 

15 or an antigen in an antigen/ antibody or 
antigen/ antibody fragment binding pair. 

"Antiligand" refers to the opposite member of a 
ligand/anti-ligand binding pair. The antiligand may be 
the other of the nucleic acid strands in a 

20 complementary, hybridized nucleic acid duplex binding 
pair; the receptor molecule in an effector/receptor 
binding pair; or an antibody or antibody fragment 
molecule in antigen/ antibody or antigen/ antibody 
fragment binding pair, respectively. 

25 "Analyte" or "analyte molecule" refers to a 

molecule, typically a macromolecule, such as a 
polynucleotide or polypeptide, whose presence, amount, 
and/ or identity are to be determined. The analyte is 
one member of a ligand/anti-ligand pair. 

30 "Analyte-specif ic assay reagent" refers to a 

molecule effective to bind specifically to an analyte 
molecule. The reagent is the opposite member of a 
ligand/anti-ligand binding pair. 

An "array of regions on a solid support" is a 

35 linear r two-dimensional array of preferably discrete 
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regions, each having a finite area, formed on the 
surface of a solid support. 

A "microarray" is an array of regions having a 
density of discrete regions of at least about 100/cm 2 , 
5 and preferably at least about iooo/cm J . The regions in 
a microarray have typical dimensions, e.g., diameters, 
in the range of between about 10-250 pm, and are 
separated from other regions in the array by about the 
same distance. 

10 A support surface is "hydrophobic" if a agueous- 

medium droplet applied to the surface does not spread 
out substantially beyond the area size of the applied 
droplet. That is, the surface acts to prevent 
spreading of the droplet applied to the surface by 
15 hydrophobic interaction with the droplet. 

A "meniscus" means a concave or convex surface 
that forms on the bottom of a liquid in a channel as a 
result of the surface tension of the liquid. 

"Distinct biopolymers", as applied to the 
20 biopolymers forming a microarray, means an array member 
which is distinct from other array members on the basis 
of a different biopolymer sequence, and/or different 
concentrations of the same or distinct biopolymers, 
and/ or different mixtures of distinct or different- 
25 concentration biopolymers. Thus an array of "distinct 
polynucleotides" means an array containing, as its 
members, (i) distinct polynucleotides, which may have a 
defined amount in each member, (ii) different, graded 
concentrations of given-sequence polynucleotides, 
30 and/or (iii) different-composition mixtures of two or 
more distinct polynucleotides. 

"Cell type" means a cell from a given source, 
e.g., a tissue, or organ, or a cell in a given state of 
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differentiation, or a cell associated with a given 
pathology or genetic makeup. 

II. Method Of Microarrav Formation 
5 This section describes a method of forming a 

microarray of analyte-assay regions on a solid support 
or substrate, where each region in the array has a 
known amount of a selected, analyte-specif ic reagent. 

Fig. 1 illustrates, in a partially schematic view, 

10 a reagent-dispenging device 10 useful in practicing the 
method. The device generally includes a reagent 
dispenser 12 having an elongate open capillary channel 
14 adapted to hold a quantity of the reagent solution, 
such as indicated at 16, as will be described below. 

15 The capillary channel is formed by a pair of spaced- 

apart, coextensive, elongate members 12a, 12b which are 
tapered toward one another and converge at a tip or tip 
region 18 at the lower end of the channel. More 
generally, the open channel is formed by at least two 

20 elongate, spaced-apart members adapted to hold a 

quantity of reagent solutions and having a tip region 
at which aqueous solution in the channel forms a 
meniscus, such as the concave meniscus illustrated at 
20 in Fig. 2A. The advantages of the open channel 

25 construction of the dispenser are discussed below. 

With continued reference to Fig. 1, the dispenser 
device also includes structure for moving the dispenser 
rapidly toward and away from a support surface, for 
effecting deposition of a known amount of solution in 

30 the dispenser on a support, as will be described below 
with reference to Figs. 2A-2C In the embodiment 
shown, this structure includes a solenoid 22 which is 
activatable to draw a solenoid piston 24 rapidly 
downwardly, then release the piston, e.g., under spring 

35 bias, t a normal, raised position, as shown. The 
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dispenser is carried on the piston by a connecting 
member 26, as shown. The just-described moving 
structure is also referred to herein as dispensing 
means for moving the dispenser into engagement with a 
5 solid support, for dispensing a known volume of fluid 
on the support. 

The dispensing device just described is carried on 
an arm 28 that may be moved either linearly or in an x- 
y plane to position the dispenser at a selected 
10 deposition position, as will be described. 

Figs. 2A-2C illustrate the method of depositing a 
known amount of reagent solution in the just-described 
dispenser on the surface of a solid support, such as 
the support indicated at 30. The support is a polymer, 
15 glass, or other solid-material support having a surface 
indicated at 31. 

In one general embodiment, the surface is a 
relatively hydrophilic, i.e., wettable surface, such as 
a surface having native, bound or covalently attached 
20 charged groups. On such surface described below is a 
glass surface having an absorbed layer of a 
polycationic polymer, such as poly-l-lysine. 

In another embodiment, the surface has or is 
formed to have a relatively hydrophobic character, 
25 i.e., one that causes aqueous medium deposited on the 
surface to bead. A variety of known hydrophobic 
polymers, such as polystyrene, polypropylene, or 
polyethylene have desired hydrophobic properties, as do 
glass and a variety of lubricant or other hydrophobic 
30 films that may be applied, to the support surface. 

Initially, the dispenser is loaded with a selected 
analyte-specif ic reagent solution, such as by dipping 
the dispenser tip, after washing, into a solution of 
the reagent, and allowing filling by capillary flow 
35 into the dispenser channel. The dispenser is now moved 
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to a selected p siti n with respect to a support 
surface, placing the dispenser tip directly above the 
support-surface position at which the reagent is to be 
deposited. This movement takes place with the 
5 dispenser tip in its raised position, as seen in Fig. 
2A, where the tip is typically at least several 1-5 mm 
above the surface of the substrate. 

With the dispenser so positioned, solenoid 22 is 
now activated to cause the dispenser tip to move 

10 rapidly toward and away from the substrate surface, 
making momentary contact with the surface, in effect, 
tapping the tip of the dispenser against the support 
surface. The tapping movement of the tip against the 
surface acts to break the liquid meniscus in the tip 

15 channel, bringing the liquid in the tip into contact 
with the support surface. This, in turn, produces a 
flowing of the liquid into the capillary space between 
the tip and the surface, acting to draw liquid out of 
the dispenser channel, as seen in Fig. 2B. 

20 Fig. 2C shows flow of fluid from the tip onto the 

support surface, which in this case is a hydrophobic 
surface. The figure illustrates that liquid continues 
to flow from the dispenser onto the support surface 
until it forms a liquid bead 32. At a given bead size, 

25 i.e., volume, the tendency of liquid to flow onto the 
surface will be balanced by the hydrophobic surface 
interaction of the bead with the support surface, which 
acts to limit the total bead area on the surface, and 
by the surface tension of the droplet, which tends 

30 toward a given bead curvature. At this point, a given 
bead volume will have formed, and continued contact of 
the dispenser tip with the bead, as the dispenser tip 
is being withdrawn, will have little or no effect on 
bead volume. 
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For liquid-dispensing on a m re hydrophilic 
surface, the liquid will have less of a tendency to 
bead, and the dispensed volume will be more sensitive 
to the total dwell time of the dispenser tip in the 
immediate vicinity of the support surface, e.g., the 
positions illustrated in Figs. 2B and 2C. 

The desired deposition volume, i.e., bead volume, 
formed by this method is preferably in the range 2 pi 
(picoliters) to 2 nl (nanoliters) , although volumes as 
high as 100 nl or more may be dispensed, it will be 
appreciated that the selected dispensed volume will 
depend on (i) the "footprint" of the dispenser tip, 
i.e., the size of the area spanned by the tip, (ii) the 
hydrophobicity of the support surface, and (iii) the 
time of contact with and rate of withdrawal of the tip 
from the support surface, in addition, bead size may 
be reduced by increasing the viscosity of the medium, 
effectively reducing the flow time of liquid from the 
dispenser onto the support surface. The drop size may 
be further constrained by depositing the drop in a 
hydrophilic region surrounded by a hydrophobic grid 
pattern on the support surface. 

In a typical embodiment, the dispenser tip is 
tapped rapidly against the support surface, with a 
total residence time in contact with the support of 
less than about 1 msec, and a rate of upward travel 
from the surface of about 10 cm/sec. 

Assuming that the bead that forms on contact with 
the surface is a hemispherical bead, with a diameter 
approximately equal to the width of the dispenser tip, 
as shown in Fig. 2C, the volume of the bead formed in' 
relation to dispenser tip width (d) is given in Table l 
below. As seen, the volume of the bead ranges between 
2 pi to 2 nl as the width size is increased from about 
35 20 t 200 urn. 
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d 


Volume (nl) 


20 vm 


2 x 10' J 


50 nm 


3.1 x 10' 2 


100 tm 


2.5 x 10' 1 


200 MB 


2 



10 At a given tip size, bead volume can be reduced in 

a controlled fashion by increasing surface 
hydrophobicity, reducing time of contact of the tip 
with the surface, increasing rate of movement of the 
tip away from the surface, and/or increasing the 

15 viscosity of the medium. Once these parameters are 

fixed, a selected deposition volume in the desired pi 
to nl range can be achieved in a repeatable fashion. 

After depositing a bead at one selected location 
on a support, the tip is typically moved to a 

20 corresponding position on a second support, a droplet 
is deposited at that position, and this process is 
repeated until a liquid droplet of the reagent has been 
deposited at a selected position on each of a plurality 
of supports. 

25 The tip is then washed to remove the reagent 

liquid, filled with another reagent liquid and this 
reagent is now deposited at each another array position 
on each of the supports. In one embodiment, the tip is 
washed and refilled by the steps of (i) dipping the 

30 capillary channel of the device in a wash solution, 
(ii) removing wash solution drawn into the capillary 
channel, and (iii) dipping the capillary channel into 
the new reagent solution. 

From the foregoing, it will be appreciated that 

35 the tweezers-lik , open-capillary dispenser tip 
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provides the advantages that (i) the open channel of 
the tip facilitates rapid, efficient washing and drying 
before reloading the tip with a new reagent, (ii) 
passive capillary action can load the sample directly 
5 from a standard microwell plate while retaining 

sufficient sample in the open capillary reservoir for 
the printing of numerous arrays, (iii) open capillaries 
are less prone to clogging than closed capillaries, and 
(iv) open capillaries do not require a perfectly faced 
10 bottom surface for fluid delivery, 

A portion of a microarray 36 formed on the surface 
38 of a solid support 40 in accordance with the method 
just described is shown in Fig, 3. The array is formed 
of a plurality of analyte-specif ic reagent regions, 
15 such as regions 42, where each region may include a 
different analyte-specif ic reagent. As indicated 
above, the diameter of each region is preferably 
between about 20-200 fim. The spacing between each 
region and its closest (non-diagonal) neighbor, 
20 measured from center-to-center (indicated at 44) , is 

preferably in the range of about 20-400 /ra. Thus, for 
example , an array having a center-to-center spacing of 
about 250 tm contains about 40 regions/cm or 1,600 
regions/cm 2 . After formation of the array, the support 
25 is treated to evaporate the liquid of the droplet 

forming each region, to leave a desired array of dried, 
relatively flat regions. This drying may be done by 
heating or under vacuum. 

In some cases, it is desired to first rehydrate 
30 the droplets containing the analyte reagents to allow 
for more time for adsorption to the solid support. It 
is also possible to spot out the analyte reagents in a 
humid environment so that droplets do not dry until the 
arraying operation is complete. 
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"I- Automated Apparatus for Formin g Arrays 

In another aspect, the invention includes an 
automated apparatus for forming an array of analyte- 
assay regions on a solid support, where each region in 
5 the array has a known amount of a selected, analyte- 
specific reagent. 

The apparatus is shown in planar, and partially 
schematic view in Fig. 4. A dispenser device 72 in the 
apparatus has the basic construction described above 
10 with respect to Fig. l, and includes a dispenser 74 

having an open-capillary channel terminating at a tip, 
substantially as shown in Figs. 1 and 2A-2C. 

The dispenser is mounted in the device for 
movement toward and away from a dispensing position at 
15 which the tip of the dispenser taps a support surface, 
to dispense a selected volume of reagent solution, as 
described above. This movement is effected by a 
solenoid 76 as described above. Solenoid 76 is under 
the control of a control unit 77 whose operation will 
20 be described below. The solenoid is also referred to 
herein as dispensing means for moving the device into 
tapping engagement with a support, when the device is 
positioned at a defined array position with respect to 
that support. 

25 The dispenser device is carried on an arm 74 which 

is threadedly mounted on a worm screw 80 driven 
(rotated) in a desired direction by a stepper motor 82 
also under the control of unit 77. At its left end in 
the figure screw 80 is carried in a sleeve 84 for 

30 rotation about the screw axis. At its other end, the 
screw is mounted to the drive shaft of the stepper 
motor, which in turn is carried on a sleeve 86. The 
dispenser device, worm screw, the two sleeves mounting 
th w rm screw, and the stepper motor used in moving 

35 th device in the "x w (horizontal) direction in the 
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figure form what is referred to here collectively as a 
displacement assembly 86. 

The displacement assembly is constructed to 
produce precise, micro-range movement in the direction 
5 of the screw, i.e., along an x axis in the figure. In 
one mode, the assembly functions to move the dispenser 
in x-axis increments having a selected distance in the 
range 5-25 jra. In another mode, the dispenser unit may 
be moved in precise x-axis increments of several 

10 microns or more,; for positioning the dispenser at 

associated positions on adjacent supports, as will be 
described below. 

The displacement assembly, in turn, is mounted for 
movement in the "y" (vertical) axis of the figure, for 

15 positioning the dispenser at a selected y axis 

position. The structure mounting the assembly includes 
a fixed rod 88 mounted rigidly between a pair of frame 
bars 90, 92, and a worm screw 94 mounted for rotation 
between a pair of frame bars 96, 98. The worm screw is 

20 driven (rotated) by a stepper motor 100 which operates 
under the control of unit 77. The motor is mounted on 
bar 96, as shown. 

The structure just described, including worm screw 
94 and motor 100, is constructed to produce precise, 

25 micro-range movement in the direction of the screw, 
i.e., along an y axis in the figure. As above, the 
structure functions in one mode to move the dispenser 
in y-axis increments having a selected distance in the 
range 5-250 fim, and in a second mode, to move the 

30 dispenser in precise y-axis increments of several 

microns (/im) or more, for positioning the dispenser at 
associated positions on adjacent supports. 

The displacement assembly and structure for moving 
this assembly in the y axis are referred to herein 

35 collectively as positioning means for positioning the 
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dispensing device at a selected array position with 
respect to a support. 

A holder 102 in the apparatus functions to hold a 
plurality of supports, such as supports 104 on which 
5 the microarrays of regent regions are to be formed by 
the apparatus. The holder provides a number of 
recessed slots, such as slot 106, which receive the 
supports, and position them at precise selected 
positions with respect to the frame bars on which the 

10 dispenser moving means is mounted. 

As noted above, the control unit in the device 
functions to actuate the two stepper motors and 
dispenser solenoid in a sequence designed for automated 
operation of the apparatus in forming a selected 

15 microarray of reagent regions on each of a plurality of 
supports. 

The control unit is constructed, according to 
conventional microprocessor control principles, to 
provide appropriate signals to each of the solenoid and 

20 each of the stepper motors, in a given timed sequence 
and for appropriate signalling time. The construction 
of the unit, and the settings that are selected by the 
user to achieve a desired array pattern, will be 
understood from the following description of a typical 

25 apparatus operation. 

Initially, one or more supports are placed in one 
or more slots in the holder. The dispenser is then 
moved to a position directly above a well (not shown) 
containing a solution of the first reagent to be 

30 dispensed on the support (s) . The dispenser solenoid is 
actuated now to lower the dispenser tip into this well, 
causing the capillary channel in the dispenser to fill. 
Motors 82, 100 are now actuated to position the 
dispenser at a s lected array position at the first of 

35 the supports. Solenoid actuation of the dispenser is 
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then effective to dispense a selected-volume droplet of 
that reagent at this location. As noted above, this 
operation is effective to dispense a selected volume 
preferably between 2 pi and 2 nl of the reagent 
solution. 

The dispenser is now moved to the corresponding 
position at an adjacent support and a similar volume of 
the solution is dispensed at this position. The 
process is repeated until the reagent has been 
dispensed at this preselected corresponding position on 
each of the supports. 

Where it is desired to dispense a single reagent 
at more than two array positions on a support, the 
dispenser may be moved to different array positions at 
each support, before moving the dispenser to a new 
support, or solution can be dispensed at individual 
positions on each support, at one selected position, 
then the cycle repeated for each new array position. 

To dispense the next reagent, the dispenser is 
positioned over a wash solution (not shown) , and the 
dispenser tip is dipped in and out of this solution 
until the reagent solution has been substantially 
washed from the tip. Solution can be removed from the 
tip, after each dipping, by vacuum, compressed air 
25 spray, sponge, or the like. 

The dispenser tip is now dipped in a second 
reagent well, and the filled tip is moved to a second 
selected array position in the first support. The 
process of dispensing reagent at each of the 
corresponding second-array positions is then carried as 
above. This process is repeated until an entire 
microarray of reagent solutions on each of the supports 
has been formed. 
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This section describes embodiments of a substrate 
having a microarray of biological polymers carried on 
the substrate surface. Subsection A describes a multi- 
cell substrate, each cell of which contains a 
microarray, and preferably an identical microarray, of 
distinct biopolymers, such as distinct polynucleotides, 
formed on a porous surface. Subsection B describes a 
microarray of distinct polynucleotides bound on a glass 
slide coated with a polycationic polymer. 



A. Multi-Cell , Substrain 

Fig. 9 illustrates, in plan view, a substrate no 
constructed according to the invention. The substrate 
has an 8 x 12 rectangular array 112 of cells, such as 
cells 114, 116, formed on the substrate surface, with 
reference to Fig. 10, each cell, such as cell 114, in 
turn supports a microarray 118 of distinct biopolymers, 
such as polypeptides or polynucleotides at known, 
addressable regions of the microarray. Two such 
regions forming the microarray are indicated at 120, 
and correspond to regions, such as regions 42, forming 
the microarray of distinct biopolymers shown in Pig. 3. 

The 96-cell array shown in Fig. 9 has typically 
array dimensions between about 12 and 244 mm in width 
and 8 and 400 mm in length, with the cells in the array 
having width and length dimension of 1/12 and 1/8 the 
array width and length dimensions, respectively, i.e., 
between about l and 20 in width and 1 and 50 mm in 
length. 

30 The construction of substrate is shown cross- 

sectionally in Fig. 11, which is an enlarged sectional 
view taken along view line 124 in Fig. 9. The 
substrate includes a water- impermeable backing 126, 
such as a glass slide or rigid polymer sheet. Formed 

35 on the surface of the backing is a water-permeable film 
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128. The film is formed of a por us membrane material , 
such as nitrocellulose membrane, or a porous web 
material, such as a nylon, polypropylene, or PVDF 
porous polymer material. The thickness of the film is 
5 preferably between about 10 and 1000 /xm. The film may 
be applied to the backing by spraying or coating 
uncured material on the backing, or by applying a 
preformed membrane to the backing. The backing and 
film may be obtained as a preformed unit from 

10 commercial source, e.g., a plastic-backed 

nitrocellulose film available from Schleicher and 
Schuell Corporation. 

With continued reference to Fig. 11, the film- 
covered surface in the substrate is partitioned into a 

15 desired array of cells by water- impermeable grid lines , 
such as lines 130 , 132, which have infiltrated the film 
down to the level of the backing, and extend above the 
surface of the film as shown, typically a distance of 
100 to 2000 pm above the film surface. 

20 The grid lines are formed on the substrate by 

laying down an uncured or otherwise f lowable resin or 
elastomer solution in an array grid, allowing the 
material to infiltrate the porous film down to the 
backing, then curing or otherwise hardening the grid 

25 lines to form the cell-array substrate. 

One preferred material for the grid is a f lowable 
silicone available from Loctite Corporation. The 
barrier material can be extruded through a narrow 
syringe (e.g., 22 gauge) using air pressure or 

30 mechanical pressure. The syringe is moved relative to 
the solid support to print the barrier elements as a 
grid pattern. The extruded bead of silicone wicks into 
the pores of the solid support and cures to form a 
shallow waterproof barrier separating the regions of 

35 the solid support. 
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In alternativ embodiments, the barrier element 
can be a wax-based material r a thermoset material 
such as epoxy. The barrier material can also be a UV- 
curing polymer which is exposed to UV light after being 
5 printed onto the solid support. The barrier material 
may also be applied to the solid support using printing 
techniques such as silk-screen printing. The barrier 
material may also be a heat-seal stamping of the porous 
solid support which seals its pores and forms a water- 

10 impervious barrier element. The barrier material may 
also be a shallow grid which is laminated or otherwise 
adhered to the solid support. 

In addition to plastic-backed nitrocellulose, the 
solid support can be virtually any porous membrane with 

15 or without a non-porous backing. Such membranes are 
readily available from numerous vendors and are made 
from nylon, PVDF, polysulfone and the like. In an 
alternative embodiment, the barrier element may also be 
used to adhere the porous membrane to a non-porous 

20 backing in addition to functioning as a barrier to 
prevent cross contamination of the assay reagents. 

In an alternative embodiment, the solid support 
can be of a non-porous material. The barrier can be 
printed either before or after the microarray of 

25 biomolecules is printed on the solid support. 

As can be appreciated, the cells formed by the 
grid lines and the underlying backing are water- 
impermeable, having side barriers projecting above the 
porous film in the cells. Thus, def ined-volume samples 

30 can be placed in each well without risk of cross- 
contamination with sample material in adjacent cells. 
In Fig. 11, defined volumes samples, such as sample 
134, are shown in the cells. 

As noted above, each well contains a microarray of 

35 distinct biopolymers. In one general embodiment, the 



WO 95/35505 



PCT/US95/07659 



26 _ 

microarrays in the well are identical arrays of 
distinct biopolymers, e.g., different sequenc 
polynucleotides. Such arrays can be formed in 
accordance with the methods described in Section II, by 
5 depositing a first selected polynucleotide at the same 
selected microarray position in each of the cells, then 
depositing a second polynucleotide at a different 
microarray position in each veil, and so on until a 
complete, identical microarray is formed in each cell. 

10 In a preferred embodiment, each microarray 

contains about 10 3 distinct polynucleotide or 
polypeptide biopolymers per surface area of less than 
about 1 cm 2 . Also in a preferred embodiment, the 
biopolymers in each microarray region are present in a 

15 defined amount between about 0.1 femtomoles and 100 

nanomoles. The ability to form high-density arrays of 
biopolymers, where each region is formed of a well- 
defined amount of deposited material, can be achieved 
in accordance with the microarray-forming method 

20 described in Section II. 

Also in a preferred embodiments, the biopolymers 
are polynucleotides having lengths of at least about 50 
bp, i.e., substantially longer than oligonucleotides 
which can be formed in high-density arrays by schemes 

25 involving parallel, step-wise polymer synthesis on the 
array surface. 

In the case of a polynucleotide array, in an assay 
procedure, a small volume of the labeled DNA probe 
mixture in a standard hybridization solution is loaded 

30 onto each cell. The solution will spread to cover the 
entire microarray and stop at the barrier elements. 
The solid support is then incubated in a humid chamber 
at the appropriate temperature as required by the 
assay. 
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Each assay may be conducted in an u open- face" 
format where no further sealing step is required, since 
the hybridization solution will be kept properly 
hydrated by the water vapor in the humid chamber. At 
5 the conclusion of the incubation step, the entire solid 
support containing the numerous microarrays is rinsed 
quickly enough to dilute the assay reagents so that no 
significant cross contamination occurs. The entire 
solid support is then reacted with detection reagents 

10 if needed and analyzed using standard color imetric, 
radioactive or fluorescent detection means. All 
processing and detection steps are performed 
simultaneously to all of the microarrays on the solid 
support ensuring uniform assay conditions for all of 

15 the microarrays on the solid support. 

B. Glass-Slide Polynucleotid e Array 
Fig. 5 shows a substrate 136 formed according to 
another aspect of the invention, and intended for use 

20 in detecting binding of labeled polynucleotides to one 
or more of a plurality distinct polynucleotides. The 
substrate includes a glass substrate 138 having formed 
on its surface, a coating of a polycationic polymer, 
preferably a cationic polypeptide, such as poly lysine 

25 or polyarginine. Formed on the polycationic coating is 
a microarray 140 of distinct polynucleotides, each 
localized at known selected array regions, such as 
regions 142. 

The slide is coated by placing a uniform-thickness 
30 film of a polycationic polymer, e.g., poly-l-lysine, on 
the surface of a slide and drying the film to form a 
dried coating. The amount of polycationic polymer 
added is sufficient to form at least a monolayer of 
polymers n the glass surface. The polymer film is 
35 bound to surf ace via lectrostatic binding between 
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10 



negative silyl-OH groups n the surface and charged 
amine groups in the polymers. Poly-l-lysine coated 
glass slides may be obtained commercially, e.g., from 
Sigma Chemical Co. (St. Louis, MO). 

To form the microarray, defined volumes of 
distinct polynucleotides are deposited on the polymer- 
coated slide, as described in Section II. According to 
an important feature of the substrate, the deposited 
polynucleotides remain bound to the coated slide 
surface non-covalently when an aqueous DNA sample is 
applied to the substrate under conditions which allow 
hybridization of reporter-labeled polynucleotides in 
the sample to complementary-sequence (single-stranded) 
polynucleotides in the substrate array. The method is 
15 illustrated in Examples l and 2. 

To illustrate this feature, a substrate of the 
type just described, but having an array of same- 
sequence polynucleotides, was mixed with fluorescent- 
labeled complementary DNA under hybridization 
20 conditions. After washing to remove non-hybridized 
material, the substrate was examined by low-power 
fluorescence microscopy. The array can be visualized 
by the relatively uniform labeling pattern of the array 
regions . 

25 In a preferred embodiment, each microarray 

contains at least 10 3 distinct polynucleotide or 
polypeptide biopolymers per surface area of less than 
about 1 cm 2 . In the embodiment shown in Fig. 5, the 
microarray contains 400 regions in an area of about 16 
mm 2 , or 2.5 x io 3 regions/cm 5 . Also in a preferred 
embodiment, the polynucleotides in the each microarray 
region are present in a defined amount between about 
0.1 femtomoles and 100 nanomoles in the case of 
polynucleotides. As above, the ability to form high- 



30 
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density arrays of this type, where each region is 
formed of a well-defined amount of deposited material, 
can be achieved in accordance with the microarray- 
forming method described in Section II. 
5 Also in a preferred embodiments, the 

polynucleotides have lengths of at least about 50 bp, 
i.e., substantially longer than oligonucleotides which 
can be formed in high-density arrays by various in situ 
synthesis schemes. 

10 

V. Utility 

Microarrays of immobilized nucleic acid sequences 
prepared in accordance with the invention can be used 
for large scale hybridization assays in numerous 

15 genetic applications, including genetic and physical 
mapping of genomes, monitoring of gene expression, DNA 
sequencing, genetic diagnosis, genotyping of ' organisms , 
and distribution of DNA reagents to researchers. 

Por gene mapping, a gene or a cloned DMA fragment 

20 is hybridized to an ordered array of DNA fragments, and 
the identity of the DNA elements applied to the array 
is unambiguously established by the pixel or pattern of 
pixels of the array that are detected. One application 
of such arrays for creating a genetic map is described 

25 by Nelson, et al. (1993). In constructing physical 
maps of the genome, arrays of immobilized cloned DNA 
fragments are hybridized with other cloned DNA 
fragments to establish whether the cloned fragments in 
the probe mixture overlap and are therefore contiguous 

30 to the immobilized clones on the array. For example, 
Lehrach, et al., describe such a process. 

The arrays of immobilized DNA fragments may also 
be used for genetic diagnostics. To illustrate, an 
array containing multiple forms of a mutated gene or 

35 genes can be pr bed with a labeled mixture of a 
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patient's DNA which will prefer ntially interact with 
only ne of the immobilized versions of the gene. 

The detection of this interaction can lead to a 
medical diagnosis. Arrays of immobilized DNA fragments 
5 can also be used in DNA probe diagnostics. For 

example, the identity of a pathogenic microorganism can 
be established unambiguously by hybridizing a sample of 
the unknown pathogen's DNA to an array containing many 
types of known pathogenic DNA. A similar technique can 

10 also be used for .unambiguous genotyping of any 

organism, other molecules of genetic interest , such as 
cDNA's and RNA's can be immobilized on the array or 
alternately used as the labeled probe mixture that is 
applied to the array. 

15 In one application, an array of cDNA clones 

representing genes is hybridized with total cDNA from 
an organism to monitor gene expression for research or 
diagnostic purposes. Labeling total cDNA from a normal 
cell with one color f luorophore and total cDNA from a 

20 diseased cell with another color f luorophore and 

simultaneously hybridizing the two cDNA samples to the 
same array of cDNA clones allows for differential gene 
expression to be measured as the ratio of the two 
f luorophore intensities. This two-color experiment can 

25 be used to monitor gene expression in different tissue 
types, disease states, response to drugs, or response 
to environmental factors. & An example of this approach 
is illustrated in Examples 2, described with respect to 
Fig. 8. 

30 By way of example and without implying a 

limitation of scope, such a procedure could be used to 
simultaneously screen many patients against all known 
mutations in a disease gene. This invention could be 
used in the form of, for example, 96 identical 0.9 cm x 

35 2.2 cm microarrays fabricated on a single 12 cm x is cm 
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sheet of plastic-backed nitrocellulose where each 
microarray could contain, for example, 100 DNA 
fragments representing all known mutations of a given 
gene. The region of interest from each of the DNA 
5 samples from 96 patients could be amplified, labeled, 
and hybridized to the 96 individual arrays with each 
assay performed in 100 microliters of hybridization 
solution. The approximately 1 thick silicone rubber 
harrier elements between individual arrays prevent 

10 cross contamination of the patient samples by sealing 
the pores of the nitrocellulose and by acting as a 
physical barrier between each microarray. The solid 
support containing all 96 microarrays assayed with the 
96 patient samples is incubated, rinsed, detected and 

15 analyzed as a single sheet of material using standard 
radioactive, fluorescent, or color imetric detection 
means (Maniatas, et a!., 1989). Previously, such a 
procedure would involve the handling, processing and 
tracking of 96 separate membranes in 96 separate sealed 

20 chambers. By processing all 96 arrays as a single 

sheet of material, significant time and cost savings 
are possible. 

The assay format can be reversed where the patient 
or organism's DNA is immobilized as the array elements 

25 and each array is hybridized with a different mutated 
allele or genetic marker. The gridded solid support 
can also be used for parallel non-DNA ELISA assays. 
Furthermore, the invention allows for the use of all 
standard detection methods without the need to remove 

30 the shallow barrier elements to carry out the detection 
step. 

In addition to the genetic applications listed 
above, arrays of whole cells, peptides, enzymes, 
antibodies, antigens, recept rs, ligands, 
35 phospholipids, polym rs, drug cogener preparations or 
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chemical substances can be fabricated by the means 
described in this invention for large scale screening 
assays in medical diagnostics, drug discovery, 
molecular biology, immunology and toxicology. 

The multi-cell substrate aspect of the invention 
allows for the rapid and convenient screening of many 
DNA probes against many ordered arrays of DNA 
fragments. This eliminates the need to handle and 
detect many individual arrays for performing mass 
screenings for genetic research and diagnostic 
applications. Numerous microarrays can be fabricated 
on the same solid support and each microarray reacted 
with a different DNA probe while the solid support is 
processed as a single sheet of material. 

The following examples illustrate, but in no way 
are intended to limit, the present invention. 



Example 1 

20 genomic-Complexitv Hybridiza tion to Micro 

DNA Arrays R epresenting the Yeast 
Saccharojnyces CBrevisiae Genome with 
Two-Color Fluorescent Detection 

The array elements were randomly amplified PCR 

25 (Bohlander, et al., 1992) products using physically 

mapped lambda clones of S. cerevisiae genomic DNA 

templates (Riles, et al., 1993). The PCR was performed 

directly on the lambda phage lysates resulting in an 

amplification of both the 35 kb lambda vector and the 

30 5-15 kb yeast insert sequences in the form of a uniform 

distribution of PCR product between 250-1500 base pairs 

in length. The PCR product was purified using 

Sephadex G50 gel filtration (Pharmacia, Piscataway, NJ) 

and concentrated by evaporation to dryness at room 

35 temperature overnight. Each of the 8 64 amplified 
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lambda cl nes was rehydrated in 15 pi of 3 x SSC in 
preparation for spotting onto the glass. 

The micro arrays were fabricated on microscope 
slides which were coated with a layer of poly-l-lysine 
(Sigma) . The automated apparatus described in Section 
IV loaded 1 pi of the concentrated lambda clone PCR 
product in 3 x SSC directly from 96 well storage plates 
into the open capillary printing element and deposited 
-5 nl of sample per slide at 380 micron spacing between 
spots, on each of 40 slides. The process was repeated 
for all 864 samples and 8 control spots. After the 
spotting operation was complete, the slides were 
rehydrated in a humid chamber for 2 hours, baked in a 
dry 80° vacuum oven for 2 hours, rinsed to remove un- 
15 absorbed DNA and then treated with succinic anhydride 
to reduce non-specific adsorption of the labeled 
hybridization probe to the poly-l-lysine coated glass 
surface. Immediately prior to use, the immobilized DNA 
on the array was denatured in distilled water at 90° 
20 for 2 minutes. 

For the pooled chromosome experiment, the 16 
chromosomes of Saccharomyces cerevisiae were separated 
in a CHEF agarose gel apparatus (Biorad, Richmond, CA) . 
The six largest chromosomes were isolated in one gel 
25 slice and the smallest 10 chromosomes in a second gel 
slice. The DNA was recovered using a gel extraction 
kit (Qiagen, Chatsworth, CA) . The two chromosome pools 
were randomly amplified in a manner similar to that 
used for the target lambda clones. Following 
30 amplification, 5 micrograms of each of the amplified 

chromosome pools were separately random-primer labeled 
using Klenow polymerase (Amersham, Arlington Heights, 
IL) with a lissamine conjugated nucleotide analog 
(Dupont NEN, Boston, MA) for the pool containing the 
35 six largest chromosomes, and with a fluorescein 
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conjugat d nucleotide analog (BMB) for the pool 
containing smallest ten chromosomes* The two po Is 
were mixed and concentrated using an ultrafiltration 
device (Amicon, Danvers, HA) . 
5 Five micrograms of the hybridization probe 

consisting of both chromosome pools in 7.5 /il of TE was 
denatured in a boiling water bath and then snap cooled 
on ice. 2.5 /il of concentrated hybridization solution 
(5 x SSC and 0.1% SDS) was added and all 10 fil 

10 transferred to the array surface, covered with a cover 
slip, placed in a custom-built single-slide humidity 
chamber and incubated at 60° for 12 hours. The slides 
were then rinsed at room temperature in 0.1 x SSC and 
0.1%SDS for 5 minutes, cover slipped and scanned. 

15 A custom built laser fluorescent scanner was used 

to detect the two-color hybridization signals from the 
1.8 x 1.8 cm array at 20 micron resolution. The 
scanned image was gridded and analyzed using custom 
image analysis software. After correcting for optical 

20 crosstalk between the fluorophores due to their 
overlapping emission spectra, the red and green 
hybridization values for each clone on the array were 
correlated to the known physical map position of the 
clone resulting in a computer-generated color karyotype 

25 of the yeast genome* 

Figure 6 shows the hybridization pattern of the 
two chromosome pools. A red signal indicates that the 
lambda clone on the array surface contains a cloned 
genomic DKA segment from one of the largest six yeast 

30 chromosomes. A green signal indicates that the lambda 
clone insert comes from one of the smallest ten yeast 
chromosomes. Orange signals indicate repetitive 
sequences which cross hybridized to both chromosome 
pools. Control spots on the array confirm that the 

35 hybridization is specific and reproducible. 
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The physical map locations of the genomic DNA 
fragments contained in each of the clones used as array 
elements have been previously determined by Olson and 
co-workers (Riles , et al.) allowing for the automatic 
5 generation of the color karyotype shown in Figure 7. 
The color of a chromosomal section on the karyotype 
corresponds to the color of the array element 
containing the clone from that section. The black 
regions of the karyotype represent false negative dark 

10 spots on the array (10%) or regions of the genome not 
covered by the Olson clone library (90%) . Note that 
the largest six chromosomes are mainly red while the 
smallest ten chromosomes are mainly green matching the 
original CHEF gel isolation of the hybridization probe. 

15 Areas of the red chromosomes containing green spots and 
vice-versa are probably due to spurious sample tracking 
errors in the formation of the original library and in 
the amplification and spotting procedures. 

The yeast genome arrays have also been probed with 

20 individual clones or pools of clones that are 

f luorescently labeled for physical mapping purposes. 
The hybridization signals of these clones to the array 
were translated into a position on the physical map of 
yeast. 

25 

Example 2 

Total cDNA Hybridized to Micro Arrays of 
cDNA Clones with Two-Color 
Fluorescent Detection 

30 24 clones containing cDNA inserts from the plant 

AraJbidopsis were amplified using PGR. Salt was added 
to the purified PCR products to a final concentration 
of 3 x SSC. The cDNA clones were spotted on poly-1- 
lysine coated microscope slides in a manner similar to 

35 Exampl 1. Among the cDNA clones was a clone 
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r presenting a transcript! n factor HAT 4, which had 
previously been used to create a transgenic line of the 
plant Arabidopsis, in which this gene is present at ten 
tines the level found in wild-type Arabidopsis (Schena, 
5 et al., 1992). 

Total poly-A mRNA from wild type Arabidopsis was 
isolated using standard methods (Maniatis, et al., 
1989) and reverse transcribed into total cDNA, using 
fluorescein nucleotide analog to label the cDNA product 

0 (green fluorescence) . A similar procedure was 

performed with the transgenic line of Arabidopsis where 
the transcription factor HAT4 was inserted into the 
genome using standard gene transfer protocols. cDNA 
copies of mRNA from the transgenic plant are labeled 

5 with a lissamine nucleotide analog (red fluorescence) . 
Two micrograms of the cDNA products from each type of 
plant were pooled together and hybridized to the cDNA 
clone array in a 10 microliter hybridization reaction 
in a manner similar to Example 1. Rinsing and 

3 detection of hybridization was also performed in a 

manner similar to Example l. Fig. 8 show the resulting 
hybridization pattern of the array. 

Genes equally expressed in wild type and the 
transgenic Arabidopsis appeared yellow due to equal 

> contributions of the green and red fluorescence to the 
final signal. The dots are different intensities of 
yellow indicating various levels of gene expression. 
The cDNA clone representing the transcription factor 
HAT4, expressed in the transgenic line of Arabidopsis 

) but not detectably expressed in wild type Arabidopsis, 
appears as a red dot (with the arrow pointing to it) , 
indicating the preferential expression of the 
transcription factor in the red-labeled transgenic 
Arabidopsis and the relative lack of expression of the 
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transcription factor in the green-labeled wild type 
Arabidopsis . 

An advantage of the microarray hybridization 
format for gene expression studies is the high partial 
5 concentration of each cDNA species achievable in the 10 
microliter hybridization reaction. This high partial 
concentration allows for detection of rare transcripts 
without the need for PCR amplification of the 
hybridization probe which may bias the true genetic 

10 representation of each discrete cDNA species. 

Gene expression studies such as these can be used 
for genomics research to discover which genes are 
expressed in which cell types, disease states, 
development states or environmental conditions. Gene 

15 expression studies can also be used for diagnosis of 
disease by empirically correlating gene expression 
patterns to disease states. 



Example 3 

Multiplexed Color imetric Hybridization on 
a Gridded Solid Support 

A sheet of plastic-backed nitrocellulose was 

gridded with barrier elements made from silicone rubber 

according to the description in Section IV-A. The 

sheet was soaked in 10 x BSC and allowed to dry. As 

shown in Fig. 12, 192 M13 clones each with a different 

yeast inserts were arrayed 4 00 microns apart in four 

quadrants of the solid support using the automated 

device described in Section ill. The bottom left 

quadrant served as a negative control for hybridization 

while each of the other three quadrants was hybridized 

simultaneously with a different oligonucleotide using 

the open-face hybridization technology described in 

Section IV-A. The first two and last four elements of 
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each array are positive contr Is for the coiorimetric 
detection step. 

The oligonucleotides were labeled with fluorescein 
which was detected using an anti-f luorescein antibody 
5 conjugated to alkaline phosphatase that precipitated an 
NBT/BCIP dye on the solid support (Amersham) . Perfect 
matches between the labeled oligos and the M13 clones 
resulted in dark spots visible to the naked eye and 
detected using an optical scanner (HP ScanJet II) 

10 attached to a personal computer. The hybridization 
patterns are different in every quadrant indicating 
that each oligo found several unique Ml 3 clones from 
among the 192 with a perfect sequence match. Note that 
the open capillary printing tip leaves detectable 

15 dimples on the nitrocellulose which can be used to 
automatically align and analyze the images. 



Although the invention has been described with 
respect to specific embodiments and methods, it will be 
20 clear that various changes and modification may be made 
without departing from the invention. 
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IT IS CLAIMED: 



1. A method of forming a microarray of analyte- 
assay regions on a solid support, where each region in 
5 the array has a known amount of a selected, analyte- 
specific reagent, said method comprising, • 

(a) loading a solution of a selected analyte- 
specific reagent in a reagent-dispensing device having 
an elongate capillary channel (i) formed by spaced- 

10 apart, coextensive elongate members, (ii) adapted to 
hold a quantity of the reagent solution and (iii) 
having a tip region at which aqueous solution in the 
channel forms a meniscus, 

(b) tapping the tip of the dispensing device 

15 against a solid support at a defined position on the 
surface, with an impulse effective to break the 
meniscus in the capillary channel and deposit: a 
selected volume of solution on the surface, and 

(c) repeating steps (a) and (b) until said array 
20 is formed. 



2. The method of claim l, wherein said tapping is 
carried out with an impulse effective to deposit a 
selected volume in the volume range between 0.01 to 100 

25 nl. 

3. The method of claim l, wherein said channel is 
formed by a pair of spaced-apart tapered elements. 

30 4 - The method of claim l, for forming a plurality 

of such arrays, wherein step (b) is applied to a 
selected position on each of a plurality of solid 
supports at each repeat cycle proceeding step (c) . 



WO 95/35505 



PCT/US95/07659 



40 _ 

5. The method f claim 1, which further includes, 
after performing steps (a) and (b) at least one time, 
reloading the reagent-dispensing device with a new 
reagent solution by the steps of (i) dipping the 
5 capillary channel of the device in a wash solution, 
(ii) removing wash solution drawn into the capillary 
channel, and (iii) dipping the capillary channel into 
the new reagent solution. 

10 6. Automated apparatus for forming a microarray 

of analyte-assay regions on a plurality of solid 
supports, where each region in the array has a known 
amount of a selected, analyte-specif ic reagent, said 
apparatus comprising 

15 (a) a holder for holding, at known positions, a 

plurality of planar supports, 

(b) a reagent dispensing device having ah open 
capillary channel (i) formed by spaced-apart , 
coextensive elongate members (ii) adapted to hold a 

20 quantity of the reagent solution and (iii) having a tip 
region at which aqueous solution in the channel forms a 
meniscus , 

(c) positioning means for positioning the 
dispensing device at a selected array position with 

25 respect to a support in said holder, 

(d) dispensing means for moving the device into 
tapping engagement against a support with a selected 
impulse, when the device is positioned at a defined 
array position with respect to that support, with an 

30 impulse effective to break the meniscus of liquid in 

the capillary channel and deposit a selected volume of 
solution on the surface, and 

(e) control means for controlling said positioning 
and dispensing means. 



35 
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7. The apparatus of claim 6, wherein said 
dispensing in ans is effective to move said dispensing 
device against a support with an impulse effective to 
deposit a selected volume in the volume range between 

5 0.01 to 100 nl. 

8. The apparatus of claim 6, wherein said channel 
is formed by a pair of spaced-apart tapered elements. 

10 9* The apparatus of claim 6, wherein the control 

means operates to (i) place the dispensing device at a 
loading station, (ii) move the capillary channel in the 
device into a selected reagent at the loading station, 
to load the dispensing device with the reagent, and 

15 (iii) dispense the reagent at a defined array position 
on each of the supports on said holder. 

10. The apparatus of claim 6, wherein the control 
device further operates, at the end of a dispensing 

20 cycle, to wash the dispensing device by (i) placing the 
dispensing device at a washing station, (ii) moving the 
capillary channel in the device into a wash fluid, to 
load the dispensing device with the fluid, and (iii) 
remove the wash fluid prior to loading the dispensing 

25 device with a fresh selected reagent. 

11. The apparatus of claim 6, wherein said device 
is one of a plurality of such devices which are carried 
on the arm for dispensing different analyte assay 

30 reagents at selected spaced array positions. 

12. A substrate with a surface having a 
microarray of at least 10 3 distinct polynucleotide or 
polypeptide biopolymers per 1 cm 2 surface area, each 
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distinct biopolymer sample (i) being disposed at a 
separate, defined position in said array, (ii) having a 
length of at least 50 subunits, and (iii) being present 
in a defined amount between about 0.1 femtomole and 100 
5 nanomoles. 

13. The substrate of claim 12, wherein said 
surface is glass slide coated with poly lysine, and said 
biopolymers are polynucleotides. 

14. The substrate of claim 12, wherein said 
substrate has a water- impermeable backing, a water- 
permeable film formed on the backing, and a grid formed 
on the film, where said grid (i) is composed of 
intersecting water-impervious grid elements extending 
from said backing to positions raised above the surface 
of said film, and (ii) partitions the film into a 
plurality of water-impervious cells, where each cell 
contains such a biopolymer array. 

15. A substrate with a surface array of sample- 
receiving cells, comprising 

a water-impermeable backing, 

a water-permeable film formed on the backing, and 
a grid formed on the film, said grid being composed of 
intersecting water-impervious grid elements extending 
from said backing to positions raised above the surface 
of said film. 

16. The substrate of claim 15, wherein the cells 
of the array each contain an array of biopolymers. 

17. A substrate for use in detecting binding of 
labeled biopolymers to one or more of a plurality 
distinct polynucleotides, comprising 
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a non-por us, glass substrate, 

a coating of a cat ionic polymer on said substrate, 

and 

an array of distinct polynucleotides to said 
5 coating, where each biopolymer is disposed at a 
separate, defined position in a surface array of 
biopolymers. 

18. A method of detecting differential expression 
10 of each of a plurality of genes in a first cell type 
with respect to expression of the same genes in a 
second cell types, said method comprising 

producing fluorescence-labeled cDNA's from mRNA's 
isolated from the two cells types, where the cDNA's 
15 from the first and second cells are labeled with first 
and second different fluorescent reporters, 

adding a mixture of the labeled cDNA's from the 
two cell types to an array of polynucleotides 
representing a plurality of known genes derived from 
20 the two cell types, under conditions that result in 

hybridization of the cDNA's to complementary-sequence 
polynucleotides in the array; and 

examining the array by fluorescence under 
fluorescence excitation conditions in which (i) 
25 polynucleotides in the array that are hybridized 

predominantly to cDNA's derived from one of the first 
and second cell types give a distinct first or second 
fluorescence emission color, respectively, and (ii) 
polynucleotides in the array that are hybridized to 
30 substantially equal numbers of cDNA's derived from the 
first and second cell types give a distinct combined 
fluorescence emission color, respectively, 

wherein the relative expression of known genes in 
the two cell types can b determined by the observed 
35 fluorescence emission color of each spot. 
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19. The method of claim 18, wherein the array of 
polynucleotides is formed on a substrate with a surface 
having an array of at least 10 J distinct polynucleotide 
or polypeptide biopolymers in a surface area of less 

5 than about l cm 2 , each distinct biopolymer (i) being 

disposed at a separate, defined position in said array, 
(ii) having a length of at least 50 subunits, and (iii) 
being present in a defined amount between about .1 
femtomole and 100 nmoles. 

0 

20. The method of claim 19, wherein said surface 
is a glass slide coated with polylysine, and said 
biopolymers are polynucleotides non-covalently bound to 
said polylysine. 
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[57] ABSTRACT 

Methods and compositions for modeling the transcriptional 
responsiveness of an organism to a candidate drug involve 
(a) detecting reporter gene product signals from each of a 
plurality of different, separately isolated cells of a target 
organism, wherein each cell contains a recombinant con- 
struct comprising a reporter gene operatively linked to a 
different endogenous transcriptional regulatory element of 
the target organism such that the transcriptional regulatory 
element regulates the expression of the reporter gene, and 
the sum of the cells comprises an ensemble of the transcrip- 
tional regulatory elements of the organism sufficient to 
model the transcriptional responsiveness of said organism to 
a drug; (b) contacting each cell with a candidate drug; (c) 
detecting reporter gene product signals from each cell; (d) 
comparing reporter gene product signals from each cell 
before and after contacting the cell with the candidate drug 
to obtain a drug response profile which provides a model of 
the transcriptional responsiveness of said organism to the 
candidate drug. 

8 Claims, No Drawings 
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METHODS FOR DRUG SCREENING 

BACKGROUND 

The field of the invention is pharmaceutical drug screen- 
ing. Phannaceutical research and development is a multi- 
billion dollar industry. Much of these resources are con- 
sumed in efforts to focus the specificity of lead compounds 
In addition, many programs are aborted after decades of 
costly yet fruitless efforts to limit side effects or toxicity of 
candidate drugs. Accordingly, tools that can abbreviate the 
research and discovery phase of drug development are 
desirable. Several in vitro or cell culture-based methods 
nave been described for identifying compounds with a 
particular biological effect through the activation of a linked 
reporter. Gadski et al. (1992) EP 92304902.7 describes 
methods for identifying substances which regulate the syn- 
thesis of an apolipoprotein; Evans et al. (1991) U.S. Pat. No 
4,981,784 describes methods for identifying ligand for a 
receptor and Fair et al. (1994) WO 94/17208 describes 
methods and kits utilizing stress promoters to determine 
toxicity of a compound. 

In general, the principle that has been applied in the 
existing pharmaceutical industry for the discovery and 
development of new lead compounds for drugs has been the 
establishment of sensitive and reliable in vitro assays for 
purified enzymes, and then screening large numbers of 25 
compounds and culture supernatants for any ability to inhibit 
enzyme activity. The present invention exploits the recent 
advances in genome science to provide for the rapid screen- 
ing of large numbers of compounds against a systemic target 
comprising substantially all targets in a pathway, organism, 30 
etc. for rare compounds having the ability to inhibit the 
protein of interest The invention described herein, in effect, 
turns the drug discovery process inside out This 'invention 
provides information on the mechanism of action of every 
compound that affects cells, regardless of the target In 35 
addition, the relative specificity of all lead compounds is 
immediately established. 

SUMMARY OF THE INVENTION 

The invention provides methods and compositions for 40 
estimating the physiological specificity of a candidate drug 
In general, the subject methods involve (a) detecting reporter 
gene product Signals from each of a plurality of different, 
separately isolated cells of a target organism, wherein each 
of said cells contains a recombinant construct comprising a 45 
reporter gene operatively linked to a different endogenous 
transcriptional regulatory element (e.g. promoter) of said 
target organism such that said transcriptional regulatory 
element regulates the expression of said reporter gene 
wherein said plurality of cells comprises an ensemble of the 50 
transcriptional regulatory elements of said organism suffi- 
cient to model the transcriptional responsiveness of said 
organism to a drug; (b) contacting each said cell with a 
candidate drug; (c) detecting reporter gene product signals 
from each of said cells; (d) comparing said reporter gene 
product signals from each of said cells before and after 
contacting each of said cells with said candidate drug to 
obtain a drug response profile; wherein said drug response 
profile provides an estimate of the physiological specificity 
or biological interactions of said candidate drug. 

DETAILED DESCRIPTION OF THE 
INVENTION 

The Genome Reporter Matrix. 

The invention provides methods and compositions for 
estimating the physiological specificity of a candidate drug 



55 



60 



by modeling the transcriptional responses of the target 
organism with an ensemble of reporters, the expressions of 
which are regulated by transcription regulatory genetic 
elements derived from the genome of the target organism. 
The ensemble of reporting cells comprises as comprehensive 
a collection of transcription regulatory genetic elements as is 
conveniently available for the targeted organism so as to 
most accurately model the systemic transcriptional response. 
Suitable ensembles generally comprise thousands of indi- 
vidually reporting elements; preferred ensembles are sub- 
stantially comprehensive, i.e. provide a transcriptional 
response diversity comparable to that of the target organism. 
Generally, a substantially comprehensive ensemble requires 
transcription regulatory genetic elements from at least a 
majority of the organism's genes, and preferably includes 
those of all or nearly all of the genes. We term such a 
substantially comprehensive ensemble a genome reporter 
matrix. 

It is frequently convenient to use an ensemble or genome 
reporter matrix derived from a lower eukaryote or common 
animal model to obtain preliminary information on drug 
specificity in higher eukaryotes, such as humans. Because 
yeast, such as Saccharomyces cerevisiae, is a bona fide 
eukaryote, there is substantial conservation of biochemical 
function between yeast and human cells in most pathways, 
from the sterol biosymhetic pathway to the Ras oncogene 
Indeed, the absence of marry effective antifungal compounds 
illustrates how difficult it has been to find therapeutic targets 
that would selectively kill fungal but not human cells. One 
example of a shared response pathway is sterol biosynthesis. 
In human cells, the drug Mevacor (lovastatin) inhibits 
HMG-CoA reductase, the key regulatory enzyme of the 
sterol biosymhetic pathway. As a result, the level of a 
particular regulatory sterol decreases, and the cells respond 
by increased transcription of the gene encoding the LDL 
receptor. In yeast, Mevacor also inhibits HMG-CoA reduc- 
tase and lowers the level of a key regulatory sterol. Yeast 
cells respond in an analogous fashion to human cells 
However, yeast do not have a gene for the LDL receptor 
Instead, the same effect is measured by increased transcrip- 
tion of the ERG 10 gene, which encodes acetoacetyl CoA 
thiolase, an enzyme also involved in sterol synthesis. Thus 
the regulatory response is conserved between yeast and 
humans, even though the identity of the responding gene is 
different. 

Advantages of the Genome Reporter Matrix as a 
Vehicle for Pharmaceutical Development 

The advantages of the subject methods over prior art 
screening methods may be illustrated by examples. Consider 
the difference between an in vitro assay for HMG-CoA 
reductase inhibitors as presently practiced by the pharma- 
ceutical industry, and an assay for inhibitors of sterol bio- 
synthesis as revealed by the ERG 10 reporter. In the case of 
the former, information is obtained only for those rare 
compounds that happen to inhibit this one enzyme In 
contrast, in the case of the ERG 10 reporter, any compound 
that inhibits nearly any of the approximately 35 steps in the 
sterol biosymhetic pathway will, by lowering the level of 
intracellular sterols, induce the synthesis of the reporter 
Thus, the reporter can detect a much broader range of targets 
than can the purified enzyme, in this case 35 times more than 
the in vitro assay. 

Drugs often have side effects that are in part due to the 
lack of target specificity. However, the in vitro assay of 
HMG-CoA reductase provides no information on the speci- 
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ficity of a compound. In contrast, a genome reporter matrix 
reveals the spectrum of other genes in the genome also 
affected by the compound. In considering two different 
compounds both of which induce the ERG10 reporter, if one 
compound affects the expression of 5 other reporters and a 
second compound affects the expression of 50 other report- 
ers, the first compound is, a priori, more likely to have fewer 
side effects. Because the identity of the reporters is known 
or determinable, information on other affected reporters is 
informative as to the nature of the side effect. A panel of 
reporters can be used to test derivatives of the lead com- 
pound to determine which of the derivatives have greater 
specificity than the first compound. 

As another example, consider the case of a compound that 
does not affect the in vitro assay for HMG-CoA reductase 
nor induces the expression of the ERG10 reporter. In the 
traditional approach to drug discovery, a compound that 
does not inhibit the target being tested provides no useful 
information. However, a compound having any significant 
effect on a biological process generally has some conse- 
quence on gene expression. A genome reporter matrix can 20 
thus provide two different kinds of information for most 
compounds. In some cases, the identity of reporter genes 
affected by the inhibitor evidences to how the inhibitor 
functions. For example, a compound that induces a cAMP- 
dependent promoter in yeast may affect the activity of the 25 
Ras pathway. Even where the compound affects the expres- 
sion of a set of genes that do not evidence the action of the 
compound, the matrix provides a comprehensive assessment 
of the action of the compound that can be stored in a 
database for later analyses. A library of such matrix response 30 
profiles can be continuously investigated, much as the 
Spectral Compendiums of chemistry arc continually refer- 
enced in the chemical arts. For example, if the database 
reveals that compound X alters the expression of gene Y, and 
a paper is published reporting that the expression of gene Y 
is sensitive to, for example, the inositol phosphate signaling 
pathway, compound X is a candidate for modulating the 
inositol phosphate signaling pathway. In effect the genome 
reporter matrix is an informational translator that takes 
information on a gene directly to a compound that may 
already have been found to affect the expression of that gene. 
This tool should dramatically shorten the research and 
discovery phase of drug development, and effectively lever- 
age the value of the publicly available research portfolio on 
all genes. 

In many cases, a drug of interest would work on protein 
targets whose impact on gene expression would not be 
known a priori. The genome reporter matrix can neverthe- 
less be used to estimate which genes would be induced or 
repressed by the drug. In one embodiment, a dominant 50 
mutant form of the gene encoding a drug-targeted protein is 
introduced into all the strains of the genome reporter matrix 
and the effect of the dominant mutant, which interferes with 
the gene product's normal function, evaluated for each 
reporter. This genetic assay informs us which genes would 
be affected by a drug that has a similar mechanism of action. 
In many cases, the drug itself could be used to obtain the 
same information. However, even if the drug itself were not 
available, genetics can be used to predetermine what its 
response profile would be in the genome reporter matrix. 
Furthermore, it is not necessary to know the identity of any 
of the responding genes. Instead, the genetic control with the 
dominant mutant sorts the genome into those genes that 
respond and those that do not. Hence, if drugs that disrupt a 
given cellular function were desired, dominant mutants for 
such function introduced into the genome reporter matrix 
reveal what response profile to expect for such an agent. 
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For example, taxol, a recent advance in potential breast 
cancer therapies, has been shown to interfere with tubulin- 
based cytoskeletal elements. Hence, a dominant mutant form 
of tubulin provides a response profile informative for breast 
cancer therapies with similar modes of action to taxol. 
Specifically, a dominant mutant form of tubulin is intro- 
duced into all the strains of the genome reporter matrix and 
the effect of this dominant mutant, which interferes with the 
microtubule cytoskeleton, evaluated for each reporter. Thus, 
any new compound that induces the same response profile as 
the dominant tubulin mutant would provide a candidate for 
a taxol-like pharmaceutical. 

In addition, the genome reporter matrix can be used to 
genetically create or model various disease states. In this 
way, pathways present specifically in the disease state can be 
targeted. For example, the specific response profile of trans- 
forming mutant Ras2 vfl/19 identifies Ras2 FO/19 induced 
reporters. Here, the matrix, in which each unit contains the 
Ras2 w mutation is used to screen for compounds that 
restore the response profile to that of the matrix lacking the 
mutation. 

Though these examples are directed to the development of 
human therapeutics, informative response profiles can often 
be obtained in nonhuman reporter matrices. Hence, for 
disease causing genes with yeast homologs, even if the 
function of the gene is not known, a dominant form of the 
gene can be introduced into a yeast-based reporter matrix to 
identify disease state specific pathways for targeting. For 
example, a reporter matrix comprising the yeast mutant 
RasX™* 19 provides a discovery vehicle for pathways specific 
to the human analog, the oncogene Ras2 vo " 2 . 

Application of Novel Combinatorial Chemistries 
with the Genome Reporter Matrix. 

Among the most important advances in drug development 
have been advances in combinatorial synthesis of chemical 
libraries. In conventional chug screening with purified 
enzyme targets, combinatorial chemistries can often help 
create new derivatives of a lead compound that will also 
inhibit the target enzyme but with some different and desir- 
able property. However, conventional methods would fail to 
recognize a molecule having a substantially divergent speci- 
ficity. The genome reporter matrix offers a simple solution to 
recognizing new specificities in combinatorial libraries. Spe- 
cifically, pools of new compounds are tested as mixtures 
across the matrix. If the pool has any new activity not 
present in the original lead compound, new genes are 
affected among the reporters. The identity of that gene 
provides a guide to the target of the new compound. Fur- 
thermore, the matrix offers an added bonus that compensates 
for a common weakness in most chemical syntheses. Spe- 
cifically, most syntheses produce the desired product in 
greatest abundance and a collection of other related products 
as contaminants due to side reactions in the synthesis. 
Traditionally the solution to contaminants is to purify away 
from them. However, the genome reporter matrix exploits 
the presence of these contaminants. Syntheses can be 
adjusted to make them less specific with a greater number of 
side reactions and more contaminants to determine whether 
anything in the total synthesis affects the expression of target 
genes of interest. If there is a component of the mixture with 
the desired activity on a particular reporter, that reporter can 
be used to assay purification of the desired component from 
the mixture. In effect, the reporter matrix allows a focused 
survey of the effect on single genes to compensate for the 
impurity of the mixture being tested. 
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Isoprcnoids arc a specialty attractive class for the genome 
reporter matrix. In nature, isoprcnoids are the champion 
signaling molecules. Isoprcnoids are derivatives of the five 
carbon compound isoprene, which is made as an interme- 
diate in cholesterol biosynthesis. Isoprcnoids include many 
of the most famous fragrances, pigments, and other biologi- 
cally active compounds, such as the antifungal sesquiterpe- 
noids, which plants use defensively against fungal infection 
Hiere are roughly 10,000 characterized isoprene derivatives 
and many more potential ones. Because these compounds 
are used in nature to signal biological processes, they are 
likely to include some of the best membrane permeant 
molecules. 

Isoprcnes possess another characteristic that lends itself 
well to drug discovery through the genome reporter matrix. 
Pure isoprenoid compounds can be chemically treated to 
create a wide mixture of different compounds quickly and 
easily, due to the particular arrangement of double bonds in 
the hydrocarbon chains. In effect, isoprcnoids can be 
mutagenized from one form into many different forms much 20 
as a wild-type gene can be mutagenized into many different 
mutants. For example, vitamin D used to fortify milk is 
produced by ultraviolet irradiation of the isoprene derivative 
known as ergosterol. New biologically active isoprenoids 
are generated and analyzed with a genome reporter matrix as 25 
follows. First a pure isoprenoid such as limonene is tested to 
determine its response profile across the matrix. Next, the 
isoprenoid (e.g. limonene) is chemically altered to create a 
mixture of different compounds. This mixture is then tested 
across the matrix. If any new responses are observed, then 
the mixture has new biologically active species. In addition 
the identity of the reporter genes provides information 
regarding what the new active species does, an activity to be 
used to monitor its purification, etc. This strategy is also 
applied to other mutable chemical families in addition to 
isoprenoids. 

- Applications of the Genome Reporter Matrix in 
Antibiotic and Antifungal Discovery. 



Fungi are important pathogens on plants and animals and 
make a major impact on the production of many food crops 
and on animal, including human, health. One major diffi- 
culty in the development of antifungal compounds has been 
the problem of finding pharmaceutical targets in fungi that 
are specific to the fungus. The genome reporter matrix offers 
a new tool to solve this problem. Specifically, all molecules 
that fail to elicit any response in the Saccharomyces reporter 
are collected into a set, which by definition must be either 
inactive biologically or have a very high specificity. A 
reporter library is created from the targeted pathogen such as 
Cryptococcus, Candida, Aspergillus, Pneumocystis etc All 
molecules from the set that do not affect Saccharomyces are 
tested on the pathogen, and any molecule that elicits an 
altered response profile in the pathogen in principle identi- 
fies a target that is pathogen-specific. As an example a 
pathogen may have a novel signaling enzyme, such as 'an 
inositol kinase that alters a position on the inositol ring that 
is not altered in other species. A compound that inhibits that 
enzyme would affect the signaling pathway in the pathogen 
and alter a response profile, but due to the absence of that 
enzyme in other organisms, would have no effect. By 
sequencing the reporter genes affected specifically in the 
target fungus and comparing the sequence with others in 
Genbank, one can identify biochemical pathways that are 
unique to the target species. Useful identified products 
mclude not only agents that kill the target fungus but also the 
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identification of specific targets in the fungus for other 
pharmaceutical screening assays. 

The identification of compounds that kill bacteria has 
been successfully pursued by the pharmaceutical industry 
for decades. It is rather simple to spot a compound that kills 
bacteria m a spot test on a petri plate. Unfortunately, growth 
inhibition screens have provided very limited lead com- 
pound diversity. However, there is much complexity to 
bacterial physiology and ecology that could offer an edge to 
development of combination therapies for bacteria, even for 
compounds that do not actually kill the bacterial cell 
Consider for example the bacteria that invade the urethra 
and persist there through the elaboration of surface attach- 
ments known as timbrae. Antibiotics in the urine stream 
have limited access to the bacteria because the urine stream 
is short-lived and infrequent. However, if one could block 
the synthesis of the timbrae to detach the bacteria, existing 
therapies would become more effective. Similarly, if the 
chemotaxis mechanism of bacteria were crippled, the ability 
of bacteria to establish an effective infection would, in some 
species, be compromised. A genome reporter matrix for a 
bacterial pathogen that contains reporters for the expression 
of genes involved in chemotaxis or fimbrae synthesis as 
examples, identifies not only compounds that do kill'the 
bacteria m a spot test, but also those that interfere with key 
steps in the biology of the pathogen. These compounds 
would be exceedingly difficult to discover by conventional 
means. 

Applications of Human Cell Based Genome 
Reporter Matrices. 

A genome reporter matrix based on human cells provides 
many important applications. For example, an interesting 
application is the development of antiviral compounds 
When human cells are infected by a wide range of viruses, 
the cells respond in a complex way in which only a few of 
the components have been identified. For example, certain 
interferons are induced as is a double-stranded RNase Both 
of these responses individually provides some measure of 
protection. A matrix that reports the induction of interferon 
genes and the double stranded RNase is able to detect 
compounds that could prophylactically protect cells before 
the arrival of the virus. Other protective effects may be 
induced in parallel. The incorporation of a panel of other 
reporter genes in the matrix is used to identify those com- 
pounds with the highest degree of specificity. 

Use of the Genome Reporter Matrix. 

The procedure to be followed in the subject methods will 
now be outlined. The initial step involves determining the 
basal or background response profile by detecting reporter 
gene product signals from each of a plurality of different, 
separately isolated cells of a target organism under one or 
more of a variety of physical conditions, such as temperature 
and pH, medium, and osmolarity. As discussed above, the 
target organism may be a yeast, animal model, human plant 
pathogen, etc. Generally, the cells are arranged in a physical 
matrix such as a microtiter plate. Each of the cells contains 
a recombinant construct comprising a reporter gene opera- 
lively linked to a different endogenous transcriptional regu- 
latory element of said target organism such that said tran- 
scnpuonal regulatory element regulates the expression of 
said reporter gene. A sufficient number of different recom- 
binant cells are included to provide an ensemble of tran- 
scnpuonal regulatory elements of said organism sufficient to 



30 



35 



5,569,588 



model the transcriptional responsiveness of said organism to 
a drug. In a preferred embodiment, the matrix is substan- 
tially comprehensive for the selected regulatory elements, 
e.g. essentially all of the gene promoters of the targeted 
organism are included. Other cis-acting or trans-acting tran- 
scription regulatory regions of the targeted organism can 
also be evaluated. In one embodiment, a genome reporter 
matrix is constructed from a set of lacZ fusions to a 
substantially comprehensive set of yeast genes. The fusions 
are preferably constructed in a diploid cell of the a/a mating 
type to allow the introduction of dominant mutations by 
mating, though haploid strains also find use with particularly 
sensitive reporters for certain functions. The fusions are 
conveniently arrayed onto a microtiter plate having 96 wells 
separating distinct fusions into wells having defined alpha- 
numeric X-Y coordinates, where each well (defined as a 
unit) confines a cell or colony of cells having a construct of 
a reporter gene operarively joined to a different transcrip- 
tional promoter. Permanent collections of these plates are 
readily maintained at -80° C. and copies of this collection 
can be made and propagated by simple mechanics and may 
be automated with commercial robotics. 

The methods involve detecting a reporter gene product 
signal for each cell of the matrix. A wide variety of reporters 
may be used, with preferred reporters providing conve- 
niently detectable signals (e.g. by spectroscopy). Typically, 
the signal is a change in one or more electromagnetic 
properties, particularly optical properties at the unit. As 
examples, a reporter gene may encode an enzyme which 
catalyzes a reaction at the unit which alters light absorption 
properties at the unit, radiolabeled or fluorescent tag-labeled 
nucleotides can be incorporated into nascent transcripts 
which are then identified when bound to oligonucleotide 
probes, etc. Examples include p-galactosidase, invertase, 
green fluorescent protein, etc Invertase fusions have the 
virtue that functional fusions can be selected from complex 
libraries by the ability of invertase to allow those genes 
whose expression increases or decreases by measuring the 
relative growth on medium containing sucrose with or 
without the compound of interest. Electronic detectors for 
optical, radiative, etc. signals are commercially available, 
e.g. automated, multi-well colon metric detectors, similar to 
automated ELISA readers. Reporter gene product signals 
may also be monitored as a function of other variables such 
as stimulus intensity or duration, time (for dynamic response 
analyses), etc. 

In a preferred embodiment, the basal response profiles are 
determined through the colorimetric detection of a lacZ 
reaction product The optical signal generated at each well is 
detected and linearly transduced to generate a corresponding 
digital electrical output signal. The resultant electrical out- 
put signals are stored in computer memory as a genome 
reporter output signal matrix data structure associating each 
output signal with the coordinates of the corresponding 
microtiter plate well and the stimulus or drug. This infor- 
mation is indexed against the matrix to form reference 
response profiles that are used to determine the response of 
each reporter to any milieu in which a stimulus may be 
provided. 

After estabUshing a basal response profile for the matrix, 
each cell is contacted with a candidate drug. The term drug 
is used loosely to refer to agents which can provoke a 
specific cellular response. Preferred drugs are pharmaceuti- 
cal agents, particularly therapeutic agents. The drug induces 
a complex response pattern of repression, silence and indue- 65 
lion across the matrix (i.e. a decrease in reporter activity at 
some units, an increase at others, and no change at still 
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others). The response profile reflects the cell's transcrip- 
tional adjustments to maintain homeostasis in the presence 
of the drug. While a wide variety of candidate drugs can be 
evaluated, it is important to adjust the incubation conditions 
(e.g. concentration, time, etc.) to preclude cellular stress, and 
hence insure the measurements of pharmaceutically relevant 
response profiles. Hence, the methods monitor transcrip- 
tional changes which the cell uses to maintain cellular 
homeostasis. Cellular stress may be monitored by any con- 
venient way such as membrane potential (e.g. dye exclu- 
sion), cellular morphology, expression of stress response 
genes, etc. In a preferred embodiment, the compound treat- 
ment is performed by transferring a copy of the entire matrix 
to fresh medium containing the first compound of interest. 

After contacting the cells with the candidate drug, the 
reporter gene product signals from each of said cells is again 
measured to determine a stimulated response profile. The 
basal of background response profile is then compared with 
(e.g. subtracted from, or divided into) the stimulated 
response profile to identify the cellular response profile to 
the candidate drug. The cellular response can be character- 
ized in a number of ways. For example, the basal profile can 
be subtracted from the stimulated profile to yield a net 
stimulation profile. In another embodiment, the stimulated 
profile is divided by the basal profile to yield an induction 
ratio profile. Such comparison profiles provide an estimate 
of the physiological specificity of the candidate drug. 

In another embodiment of the invention, a matrix of 
hybridization probes corresponding to a predetermined 
population of genes of the selected organism is used to 
specifically detect changes in gene transcription which result 
from exposing the selected organism or cells thereof to a 
candidate drug. In this embodiment, one or more cells 
derived from the organism is exposed to the candidate drug 
in vivo or ex vivo under conditions wherein the drug effects 
a change in gene transcription in the cell to maintain 
homeostasis. Thereafter, the gene transcripts, primarily 
mRNA, of the cell or cells is isolated by conventional 
means. The isolated transcripts or cDNAs complementary 
thereto are then contacted with an ordered matrix of hybrid- 
ization probes, each probe being specific for a different one 
of the transcripts, under conditions wherein each of the 
transcripts hybridizes with a corresponding one of the 
probes to form hybridization pairs. The ordered matrix of 
probes provides, in aggregate, complements for an ensemble 
of genes of the organism sufficient to model the transcrip- 
tional responsiveness of the organism to a drug. The probes 
are generally immobilized and arrayed onto a solid substrate 
such as a microtiter plate. Specific hybridization may be 
effected, for example, by washing the hybridized matrix 
with excess non-specific oligonucleotides. A hybridization 
signal is then detected at each hybridization pair to obtain a 
matrix-wide signal profile. A wide variety of hybridization 
signals may be used; conveniently, the cells are pre-labeled 
with radionuclides such that the gene transcripts provide 
a radioactive signal that can be detected in the hybridization 
pairs. The matrix-wide signal profile of the drug-stimulated 
cells is then compared with a matrix-wide signal profile of 
negative control cells to obtain a specific drug response 
profile. 

The invention also provides means for computer-based 
qualitative analysis of candidate drugs and unknown com- 
pounds. A wide variety of reference response profiles may be 
generated and used in such analyses. For example, the 
response of a matrix to loss of function of each protein or 
gene or RNA in the cell is evaluated by introducing a 
dominant allele of a gene to each reporter cell, and deter- 
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In an alternative embodiment th? . ■ • 

de veloped in a stnrin defide ^ tlSSSLCL" 
wherem the majority of nonsense notations caused?' 

the ^SSSmSR K'STr 

prenmure tenninarion codonTln ZZ£t££ B ^ 
fimcaon most nonsense mutations encoTshortZS 
protein fragments. Many of these mtafere ^.hTn™ i 
proteinfunction and heni have tanSSno^^ 

The resultant data identify genetic response profiles 
Thes : data arc sorted by individual gene response ESS 
s P ecific "y * each gene to a particular Ttim£\ 
weighting matrix is established which weiohrc thT? , 

r , h Z Spt ! nse P™ m « an unknown stimulus (eg new 
chemicals, unknown compounds or unknown nrixt ™5i !, 
be analyzed by comparing E£ 

«mn analyses generally take the form of an indexed 
report of the matches to the reference chemical rCnsI 
profiles, ranked according to the weighted vaJue^f^h 
matching reporter. If there is a match (U^rlTl^^ 

one of the known compounds unon »wa a ^ 
new compound is a candidate for a mnie™!* J-X 

be^mn^T '"J?" ° fa Dew cnemical ^mulus may also «o 
oe compared to a known genetic resrxmw nmfii- rT . 

profiles, die target gene or its functional wta^MMta 
presumptive target of the chemical. If the chffi^rS 

nathwav if ^ , 6 mUtant « ene but m «"» same 
pathway. If the chemical response profile includes as a 
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EXAMPLES 
I .Transcriptional promoter-reporter gene matrix 

^sssesse^ ***** * ,he 

Se^r^n^ ™rauo* of SSK 

uSrT?-^ 118 W3S dCtennined * «««ton Eta S 
ugAml. To produce a mevinolin-stimuiated matrix eachweH 
of 60 microtiter plates is filled with lOOu l ^Un^l 

each well allowing for a dilution of appmxirnatelyTlOO 
The cells are incubated in the medium until the tobi«itv of 
the average reporter increases by 20 fold. Each weH t Ln 
MM for mrbidi ty as a me Jure of 
with alysis solution to allow measurement of B-galactofi 
dase from each fusion. P S 313 " 051 - 

B) Generation of an output signal matrix data structure 
Both the turbidity and the B-galactosidase an. ™a 
conrmercially availabk nuooti JS^L"^ 
Rad) and me data captured as an ASCII file, r^m this to 
» mdiYidUal CE,ls in te W n»S £ a 

ISC 0 ; ,n ^ reference res P onse ^ 

suotracted. The difference corresponds to the mevinnUn 

ssst y fi,e is « «- " 

tawe indexed by the response of each cell to the inhibitor 
For example, the genes encoding acetoacetvl-CoATn JT. 
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A physical matrix is constructed as describe above except 
the mevinolin is replaced with an unknown test compound. 
The resultant response profile is compared to the response 
profiles of a library of known bioacUve compounds and 
analyzed as described above. For example, if the test com- 
pound output profile shows both acetoacetyl-CoA thiolase 
and squalene synthase gene induced, then the output profile 
matches that expected of an inhibitor of cholesterol synthe- 
sis. If the response profile has fewer other cells affected than 
the response profile to mevinolin, the unknown compound is 
a candidate for greater specificity. If the response profile of 
the new chemical affects fewer other reporters than the 
response profile to mevinolin, and if the other reporters 
affected by mevinolin have a lower weighted value, then the 
compound is a candidate for greater specificity If the 
response profile has more different cells affected than the 
response profile to mevinolin, then the compound is a 
candidate for less specificity. In the case where mixtures of 
compounds are tested, the highest weighted responses are 
evaluated to determine whether they can be deconvoluted 
into die response profile of two different compounds, or of 
two different genetic response profiles. 

2 Reporter transcript-oligonucleotide hybridization probe 
matrix: Construction of stimulated physical matrix and 
generation of an output signal matrix data structure. 

Unlabeled oligonucleotide hybridization probes comnle- 
mentary to the mRNA transcript of each yeast gene « 
arrayed on a silicon substrate etched by standard techniques 
(e g. Fodor et al (1991) Science 252, 767). The probes^ 
of length and sequence to ensure specificity for the corre- 
lTngm.^ yeSSl gOTe ' lypiC3lIy abom 24 " 240 nucleotides in 

A confluent HcLa cell culture is treated with 15 ue/ml 
mevinolin m 2% ethanol for 4 hours while maintained in a 
humidified 5% C0 3 atmosphere at 37° C Messenger RNA 
is exacted, reverse transcribed and fluorophore-labeled 
according to standard methods (Sambrook et al., Molecular 
Cloning 3rd ed). The resultant cDNA is hybridized to the 

l^i^M^l^ 15 Washed free of unhybridized 
labeled cDNA, the hybridization signal at each unit of the 
array quantified using a confocal microscope scanner 
(instruments by Molecular Devices and Afiymetrix) and the 
resultant matrix response data stored in digital form. 
3. Two-dimensional two-hybrid matrix 
A) Construction of stimulated physical matrix. 

two-dimensional two-hybrid (see, e.e Chien et al 
(1991) PNAS, 88, 9578)matrix is design^o screen 
compounds that specifically affect the interaction of two 
proteins, e.g. the interaction of a human signal transducer 

r^mor V Cn°H f ^ Pti ° n (STAT) With an ^erleukin 
receptor. Two hybrid fusions are generated by standard 50 

u S* stram contains a P^on of the targeted 

human STAT gene, fused to a portion of a yeast or baS 

S?*i5? dmg 3 ° NA Wnding domain <** GAL4;1-147) 
, Tt^ 5 ?? 11 ? 100 ^og" 1 ' 2 ^ by that DNA binding domain 
(e.g UAS C ) is inserted in place of the enhancer sequence 5' 
°* C porter (e.g. lacZ). T^e strain also contains 

another fus t on consistmg of an intracellular portion of the 
^^A^ t0r gme Wh0se Product interacts with 

^3 gCne * **** with a g^e fragment 

G^68- a 88ir SCnPtlODal aCtiVad0n 
B) Generation of signal matrix data structure. 
Both the turbidity and the galactosidase are read on 

con^ercxal microliter plate readers (BioRad) and the data 

captured as an ASCII file. 

bas? Comparison of sigDaJ mauix structure with data 
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Data are analyzed for those compounds that block the 
interaction of the two human proteins by reducing the signal 
produced from the reporter in the various strains containing 
pairs of human proteins. The output is processed to identify 
compounds with a large impact on a reporter whose expres- 
sion is dependent on a single pair of interacting human 
proteins. An inverted weighting matrix is used to evaluate 
these data as preferred compounds do not affect even the 
least specific reporters in the matrix. 

All publications and patent applications cited in this 
specification are herein incorporated by reference as if each 
individual publication or patent application were specifically 
and [individually indicated to be incorporated by reference 
Although the foregoing invention has been described in 
some detail by way of illustration and example for purposes 
of clarity of understanding, it will be readily apparent to 
those of ordinary skill in the art in light of the teachings of 
this invention that certain changes and modifications mav be 
made thereto without departing from the spirit or scope of 
the appended claims. y 
What is claimed is: 

1. A method for modeling of the transcriptional respon- 
siveness of an organism to a candidate drug which has an 
effect on gene transcription in cells of said organism, com- 
prising steps: ewuwu, wjm 

(a) detecting reporter gene product signals from each of a 
plurality of different, separately isolated cells of a target 
organism, wherein each of said cells contains a recom- 
binant construct comprising a reporter gene operati vely 
linked to a different endogenous transcriptional regu- 
latory element of said target organism such that said 
transcnpuonal regulatory element regulates the expres- 
sion of said reporter gene, wherein said plurality of 
cells comprises an ensemble of the transcriptional 
regulatory elements of said organism sufficient to 
model the transcriptional responsiveness of said organ- 
ism to a drug; 6 

(b) contacting each of said cells with a candidate drug 
under conditions, wherein said cells maintain homeo- 
stasis; 

(c) detecting reporter gene product signals from each of 
said cells; 

(d) comparing said reporter gene product signals from 
each of said cells before and after contacting each of 
said cells with said candidate drug to obtain a dnu 
response profile; * 

wherein said drug response profile provides a model of 
the transcnpuonal responsiveness of said organism to 
said candidate drug. 
2. A method according to claim 1, said ensemble com- 
prising a majority of all different transcriptional regulatory 
elements of said organism. B««wry 
3 A method according to claim 1, said drug beine a 
candidate human therapeutic. S 

4. A method according to claim 1, wherein said cells are 
yeast cells. 

5. A method according to claim 1, wherein said cells arc 
bacterial cells. 

hum*M cells ^ 10 claim *■ wherein said cells are 

a ^ t ?^ rowing to claim 1, wherein the reporter 

?~,n fl * C IacZ genC * * e SUc2 * ene > or a S ene ^coding a 
green fluorescent protein. 6 

e U k 8 a^ouS eC ° nlin8 1 ° daim WherCin Said 06118 « 
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Contributed by Ronald W. Davis, December 27, 1996 

ABSTRACT cDNA microarray technology is used to profile 
complex diseases and discover novel disease-related genes. In 
inflammatory disease such as rheumatoid arthritis, expression 
patterns of diverse cell types contribute to the pathology. We 
have monitored gene expression in this disease state with a 
microarray of selected human genes of probable significance in 
inflammation as well as with genes expressed in peripheral 
human blood cells. Messenger RNA from cultured macrophages, 
chondrocyte cell lines, primary chondrocytes, and synoviocytes 
provided expression profiles for the selected cytokines, chemo- 
kines, DNA binding proteins, and matrix-degrading metal- 
loproteinases. Comparisons between tissue samples of rheuma- 
toid arthritis and inflammatory bowel disease verified the in- 
volvement of many genes and revealed novel participation of the 
cytokine interleukin 3, chemokine Groa and the metal- 
loproteinase matrix metallo-elastase in both diseases. From the 
peripheral blood library, tissue inhibitor of metalloproteinase 1, 
ferritin light chain, and manganese superoxide dismutase genes 
were identified as expressed differentially in rheumatoid arthri- 
tis compared with inflammatory bowel disease. These results 
successfully demonstrate the use of the cDNA microarray system 
as a general approach for dissecting human diseases. 

The recently described cDNA microarray or DNA-chip tech- 
nology allows expression monitoring of hundreds and thou- 
sands of genes simultaneously and provides a format for 
identifying genes as well as changes in their activity (1, 2). 
Using this technology, two-color fluorescence patterns of 
differential gene expression in the root versus the shoot tissue 
of Arabidopsis were obtained in a specific array of 48 genes (1). 
In another study using a 1000 gene array from a human 
peripheral blood library, novel genes expressed by T cells were 
identified upon heat shock and protein kinase C activation (3). 

The technology uses cDNA sequences or cDNA inserts of a 
library for PCR amplification that are arrayed on a glass slide with 
high speed robotics at a density of 1000 cDNA sequences per cm 2 . 
These microarrays serve as gene targets for hybridization to 
cDNA probes prepared from RNA samples of cells or tissues. A 
two-color fluorescence labeling technique is used in the prepa- 
ration of the cDNA probes such that a simultaneous hybridization 
but separate detection of signals provides the comparative anal- 
ysis and the relative abundance of specific genes expressed (1, 2). 
Microarrays can be constructed from specific cDNA clones of 
interest, a cDNA library, or a select number of open reading 
frames from a genome sequencing database to allow a large-scale 
functional analysis of expressed sequences. 
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Because of the wide spectrum of genes and endogenous 
mediators involved, the microarray technology is well suited 
for analyzing chronic diseases. In rheumatoid arthritis (RA), 
inflammation of the joint is caused by the gene products of 
many different cell types present in the synovium and cartilage 
tissues plus those infiltrating from the circulating blood. The 
autoimmune and inflammatory nature of the disease is a 
cumulative result of genetic susceptibility factors and multiple 
responses, paracrine and autocrine in nature, from macro- 
phages, T cells, plasma cells, neutrophils, synovial fibroblasts, 
chondrocytes, etc. Growth factors, inflammatory cytokines 
(4), and the chemokines (5) are the important mediators of this 
inflammatory process. The ensuing destruction of the cartilage 
and bone by the invading synovial tissue includes the actions 
of prostaglandins and leukotrienes (6), and the matrix degrad- 
ing metalioproteinases (MMPs). The MMPs are an important 
class of Zn-dependent metallo-endoproteinases that can col- 
lectively degrade the proteoglycan and collagen components of 
the connective tissue matrix (7). 

This paper presents a study in which the involvement of 
select classes of molecules in RA was examined. Also inves- 
tigated were 1000 human genes randomly selected from a 
peripheral human blood cell library. Their differential and 
quantitative expression analysis in cells of the joint tissue, in 
diseased RA tissue and in inflammatory bowel disease (IBD) 
tissues was conducted to demonstrate the utility of the mi- 
croarray method to analyze complex diseases by their pattern 
of gene expression. Such a survey provides insight not only into 
the underlying cause of the pathology, but also provides the 
opportunity to selectively target genes for disease intervention 
by appropriate drug development and gene therapies. 

METHODS 

Microarray Design, Development, and Preparation. Two ap- 
proaches for the fabrication of cDNA microarrays were used in 
this study. In the first approach, known human genes of probable 
significance in RA were identified. Regions of the clones, pref- 
erably 1 kb in length, were selected by their proximity to the 3' end 
of the cDNA and for areas of least identity to related and 
repetitive sequences. Primers were synthesized to amplify the 
target regions by standard PCR protocols (3). Products were 

Abbreviations: RA, rheumatoid arthritis; MMP, matrix-degrading 
metalloproteinase; IBD, inflammatory bowel disease; LPS, lipopoly- 
saccharide; PMA, phorbol 12-myristate 13-acetate; TNF-a, tumor 
necrosis factor a; IL, interleukin; TGF-/3, transforming growth factor 
/3; GCSF, granulocyte colony-stimulating factor; MIP, macrophage 
inflammatory protein; MIF, migration inhibitory factor; HME, human 
matrix metallo-elastase; RANTES, regulated upon activation, normal 
T cell expressed and secreted; Gel, gelatinase; VCAM, vascular cell 
adhesion molecule; ICE, IL-1 converting enzyme; PUMP, putative 
metalloproteinase; MnSOD, manganese superoxide dismutase; TIMP, 
tissue inhibitor of metalloproteinase; MCP, macrophage chemotactic 
protein. 

To whom reprint requests should be sent at the present address: 
Roche Bioscience, S3-1, 3401 Hillview Avenue, Palo Alto, CA 94304. 
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verified by gel electrophoresis and purified with Qiaquick 96-weli 
- -purificatiefl kit<Qiagern Ghatswortrr, OA% ryophilized (Savant), 
and resuspended in 5 jtl of 3 x standard saline citrate (SSC) buffer 
for arraying. In the second approach, the microarray containing 
the 1056 human genes from the peripheral blood lymphocyte 
library was prepared as described (3). 

Tissue Specimens. Rheumatoid synovial tissue was obtained 
from patients with late stage classic R A undergoing remedial 
synovectomy or arthroplasty of the knee. Synovial tissue was 
separated from any associated connective tissue or fat. One 
gram of each synovial specimen was subjected to RNA extrac- 
tion within 40 min of surgical excision, or explants were 
cultured in serum-free medium to examine any changes under 
in vitro conditions. For IBD, specimens of macroscopically 
inflamed lower intestinal mucosa were obtained from patients 
with Crohn disease undergoing remedial surgery. The hyper- 
trophied mucosal tissue was separated from underlying con- 
nective tissue and extracted for RNA. 

Cultured Cells. The Mono Mac-6 (MM6) monocytic cells 
(8) were grown in RPMI medium. Human chondrosarcoma 
SW1353 cells, primary human chondrocytes, and synoviocytes 
(9, 10) were cultured in DMEM; all culture media were 
supplemented with 10% fetal bovine serum, 100 jig/ml strep- 
tomycin, and 500 units/ml penicillin. Treatment of cells with 
lipopolysaccharide (LPS) endotoxin at 30 ng/ml, phorbol 
12-myristate 13-acetate (PMA) at 50 ng/ml, tumor necrosis 
factor a (TNF-a) at 50 ng/ml, interleukin (IL)-lj3 at 30 ng/ml 
or transforming growth factor-0 (TGF-/3) at 100 ng/ml is 
described in the figure legends. 
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Fluorescent Probe, Hybridization, and Scanning. Isolation of 
mRNA, probe preparation, and quantitation with Arabidopsis 
control mRNAs was essentially as described (3) except for the 
following minor modification. Following the reverse transcriptase 
step, the appropriate Cy3- and Cy5-labeled samples were pooled* 
mRNA degraded by heating the sample to 65°C for 10 min with 
the addition of 5 M l of 0.5M NaOH plus 0.5 ml of 10 mM EDTA. 
The pooled cDNA was purified from unincorporated nucleotides 
by gel filtration in Centri-spin columns (Princeton Separations, 
Adelphia, NJ). Samples were lyophilized and dissolved in 6 n\ of 
hybridization buffer (5x SSC plus 0.2% SDS). Hybridizations, 
washes, scanning, quantitation procedures, and pseudocolor rer> 
resentations of fluorescent images have been described (3). Scans 
for the two fluorescent probes were normalized either to the 
fluorescence intensity of Arabidopsis mRNAs spiked into the 
labeling reactions (see Figs. 2-4) or to the signal intensity of 
/3-actin and glyceraldehyde-3-phosphate dehydrogenase 
(GAPDH; see Fig. 5). 5 

RESULTS 

Ninety-Six-Gene Microarray Design. The actions of cytokines, 
growth factors, chemokines, transcription factors, MMPs, pros- 
taglandins, and leukotrienes are well recognized in inflammatory 
disease, particularly RA (11-14). Fig. 1 displays the selected genes 
for this study and also includes control cDNAs of housekeeping 
genes such as /3-actin and GAPDH and genes from Arabidopsis 
for signal normalization and quantitation (row A, columns 1-12). 

Denning Microarray Assay Conditions. Different lengths and 
concentrations of target DNA were tested by arraying PCR- 



A 


BLANK 


BLANK 


f HAT1 

[• - • 


HAT1 


[ HAT4 








| HAT1 


HAT1 


\ HAT4 

r 


B 


IL1A 


111 B 


IL1RA 


IL2 


IL3 


(L-1a 




IL-1RA 


IL-2 


: IL-3 


c 


IL8 




IL10 


ICE 


IFNQ 




IL-8 


IL-9 


IL-10 


ICE 


IFNy 


D 


TNFA.1 . 


TNFA>2 


TNFA.3 


TNFA,4 


TNFA.5 


TNFa 


TNFa 


TNFa 


TNFa 


TNFa 


E : 


STR1 


STR2-3* 


STR3 


COL1 


COLKT 




Strom-1 


Strom-2 


Strom-3 , 


Coin 


Coll-1.3' 


F 


GELA.1 


GELB 


HME 


MTMMP 


PUMP1 



HAT4 j I HAT22 j 1 ; HAt^R l^Szf] \ YES23 ' ffjWBfB 

-^rJ K HAT2g j HAT22 j[VYES23 j \ YES23 iHHS 

,L4 .\ *** ; ttBR IL7 ' CFOS CJUN 

M jj M .|| IL-6R .. IL-7 | c-fos c-jun 

GCSF jf MCSF I <»ICSF TNFar j OREL NFKB50 

^ } \ M^SP j Gl*CSF r TNFp j c-rel NF^BpSO 



Elastase 





MT-MMP Matrilysln TIMP-1 **( j TlMP-2 | iTIMP-^j 







MCP1.1 ; MCP1.1 MIP1A 
MCP-1 MCP-1 MiP-1« 

j A. thaliana controls 
M Human controls 





MIP1B MIF \ RANTES 
MIP-1p MIF RANTES 




__QRO 
GROla 




Rat Fra-1 



NFKB65.1 

NFicBp65 






i j Cytokines and related genes 

Transcription factors and related genes 
-^j MMP's and related genes 



Chemokines 

Growth factors and related < 
Other genes 



2152 Biochemistry: Heller et al 

amplified products ranging from 0.2 to 1.2 kb at concentrations 
■- - of-±*tg/ftk>r less. No signrficantdifferente in the signal levels was " 
observed within this range of target size and only with 0.2-kb 
length was a signal reduced upon an 8-fold dilution of the 1 yjgjpX 
sample (data not shown). In this study the average length of the 
targets was 1 kb, with a few exceptions in the range of ~300 bp, 
arrayed at a concentration of 1 \x%l\A. Normally one PCR pro- 
vided sufficient material to fabricate up to 1000 microarray targets. 

In considering positional effects in the development of the 
targets for the microarrays, selection was biased toward the 3' 
proximal regions, because the signal was reduced if the target 
fragment was biased toward the 5' end (data not shown). This 
result was anticipated since the hybridizing probe is prepared by 
reverse transcription with oligo(dT)-primed mRNA and is richer 
in 3' proximal sequences. Cross-hybridizations of probes to 
targets of a gene family were analyzed with the matrix metal- 

A. 

iminriucoct , r 
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loproteinases as the example because they can show regions of 
sequence identities of greater than 70%. With collagenase-l 
(Col-1) and colIagenase-2 (C&I-2) genes as targets with up to 70% 
sequence identity, andstromelysin-1 (Strom-1) and stromerysin-2 
(Strom-2) genes with different degrees of identity, our results 
showed that a short region of overlap, even with 70-90% se- 
quence identity, produced a low level of cross-hybridization. - 
However, shorter regions of identity spread over the length of the 
target resulted in cross-hybridization (data not shown). For 
closely related genes, targets were designed by avoiding long 
stretches of homology. For members of a gene family two or more 
target regions were included to discriminate between specificity 
of signal versus cross-hybridization. 

Monitoring Differential Expression in Cultured Cell Lines. In 
RA tissue, the monocyte/macrophage population plays a prom- 
inent role in phagocytic and immunomodulatory activities. Typ- 
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ically these cells, when triggered by an immunogen, produce the 
- -proinflammatroy cytokines TNF amh Hj-1. \Ve~have used the 
monocyte cell line MM6 and monitored changes in gene expres- 
sion upon activation with LPS endotoxin, a component of Gram- 
negative bacterial membranes, and PMA, which augments the 
action of LPS on TNF production (15). RNA was isolated at 
different times after induction and used for cDNA probe prep- 
aration. From this time course it was clear that TNF expression 
was induced within 15 min of treatment, reached maximum levels 
in 1 hr, remained high until 4 hr and subsequently declined (Fig. 
2A). Many other cytokine genes were also transiently activated 
such as IL-la and -ft IL-6, and granulocyte colony-stimulating 
factor (GCSF). Prominent chemokines activated were IL-8 mac- 
rophage inflammatory protein (MIP>lft more so than MIP-la 
and Groa or melanoma growth stimulatory factor. Migration 
inhibitory factor (MIF) expressed in the uninduced state declined 
in LPS-activated cells. Of the immediate early genes, the notice- 
able ones were c-fos y fra-l, c-jun, NF-*Bp50, and IkB, with c-rel 
expression observed even in the uninduced state (Fig. 2B). These 
expression patterns are consistent with reported patterns of 
activation of certain LPS- and PMA-induced genes (12). Dem- 
onstrated here is the unique ability of this system to allow parallel 
visualization of a large number of gene activities over a period of 
time. r 

SW1353 cells is a line derived from malignant tumors of the 
cartilage and behaves much like the chondrocytes upon stim- 
ulation with TNF and IL-1 in the expression of MMPs (9). In 
addition to confirming our earlier observations with Northern 
blots on Strom-1, Col-l, and Col-3 expression (9), gelatinase 
(Gel) A, putative metalloproteinase (PUMP)-l membrane- 
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type matrix metalloproteinase, tissue inhibitors of matrix 

/TA a JD 0p [ 0teinaSes ° r lissue inhibitor of metalloproteinase 1 
( I IMP-1), -2, and -3 were also expressed by these cells together 
with the human matrix metallo-elastase (HME; Fig 3A) HME 
induction was estimated to be -50-fold and was greater than 
any of the other MMPs examined (Fig. 3B). This result was 
unexpected because HME is reportedly expressed only by 
alveolar macrophage and placental cells (16). Expression of 
the cytokines and chemokines, IL-6, IL-8, MIF, and MIP-18 
was also noted. A variety of other genes, including certain 
transcription factors, were also up-regulated (Fig. 3), but the 
overall time-dependent expression of genes in the SW1353 
cells was qualitatively distinct from the MM6 cells. 

Quantitation of differential gene expression (Figs. 25 and 
35) was achieved with the simultaneous hybridization of 
Cy3-labeled cDNA from untreated cells and Cy5-Iabeled 
cDNA from treated samples. The estimated increases in 
expression from these microarrays for a select number of eenes 
including IL-1/3, IL-8, MIP-1/3, TNF, HME, Col-l, Col-3 
Strom-1, and Strom-2 were compared with data collected from 
dot blot analysis. Results (not shown) were in close agreement 
and confirmed our earlier observations on the use of the 
microarray method for the quantitation of gene expression (3) 
Expression Profiles in Primary Chondrocytes and Synovio- 
cytes of Human RA Tissue. Given the sensitivity and the 
specificity of this method, expression profiles of primary 
synoviocytes and chondrocytes from diseased tissue were 
examined. Without prior exposure to inducing agents, low level 
expression of c-jun, GCSF, IL-3, TNF-jS, MIF, and RANTES 
(regulated upon activation, normal T cell expressed and se- 
creted) was seen as well as expression of MMPs, GelA 
Strom-1, Col-l, and the three TIMPs. In this case, CoI-2 
hybridization was considered to be nonspecific because the 
second Col-2 target taken from the 3' end of the gene gave no 

A. Human synovial fibroblasts B. Human articular chondrocytes 




Fig 3. Time course for IL-1/3 and TNF-induced SW1353 cells 
using the inflammation array (Fig. 1). (A) Pseudocolor representation 
of nuorescent scans correspond to gene expression levels at each time 
point. (B I-IV) » Relatrve levels of selected genes at different time points 
compared with time zero. 



Fig. 4. Expression profiles for early passage primary synoviocytes and 
chondrocytes rcolated from RA tissue, cultured in the presence of 10% 
fetal calf serum and activated with PMA and IL-1/3, or TNF and IL-10, 
or TGF-0 for 18 hr. The color bars provide a comparative calibration scale 
between arrays and are derived from the Arabidopsis mRNA samples that 
are introduced in equal amounts during probe preparation 
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signal. Treatment more so with PMA and IL-1, than TNF and 
- -IL4y- produced a dramatic up-regorafion in" expression of 
several genes in both of these primary cell types. These genes 
are as follows: the cytokine IL-6, the chemokines IL-8 and 
Gro-la, and the MMPs; Strom-1, Col-1, CoI-3, and HME; and 
the adhesion molecule, vascular cell adhesion molecule 1 
(VCAM-1). The surprise again is HME expression in these 
primary cells, for reasons discussed above. From these results 
the expression profiles of synoviocytes and the chondrocytes 
appear very similar; the differences are more quantitative than 
qualitative. Treatment of the primary chondrocytes with the 
anabolic growth factor TGF-/3 had an interesting profile in that 
it produced a remarkable down-regulation of genes expressed 
in both the untreated and induced state (Fig. 4). 

Given the demonstrated effectiveness of this technology a 
comparative analysis of two different inflammatory disease 
states was conducted with probes made from RA tissue and 
IBD samples. RA samples were from late stage rheumatoid 
synovial tissue, and IBD specimens were obtained from in- 
Sf-T? IOWCr intestinal mucosa of patients with Crohn disease. 
With both the 96-element known gene microarray and the 
1000-gene microarray of cDNAs selected from a peripheral 
human blood cell library (3), distinct differences in gene 
expression patterns were evident. On the 96-gene array RA 
tissue samples from different affected individuals gave similar 
profiles (data not shown) as did different samples from the 
same individual (Fig. 5). These patterns were notably similar 
to those observed with primary synoviocytes and chondrocytes 
(Fig. 4). Included in the list of prominently up-regulated genes 
are IL-6, the MMPs Strom-1, Col-1, GelA, HME, and in 
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Ti?; 5 ; ExP™ 3 ' 10 " P rofiles of RA tissue (A) and IBD tissue (B) 
mRNA from R A tissue samples obtained from the same individual was 
isolated directly after excision (RA 21.5A) or maintained in culture 
without serum for 2 hr (RA 21.5B) or for 6 hr (RA 21.5C). Profiles 
from issue samples of two other individuals (data not shown) were 
remarkably similar to the ones shown here. IBD-A and IBD-CI are 
from mRNA samples prepared directly after surgery from two sepa- 
rate individuals For the IBD-CII probe, the tissue sample was cultured 
in medium without serum for 2 hr before mRNA preparation 
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^ n . jam /! eS PUMP ' ™ Ps ' Particularly TIMP-1 and 
1 1MP-3, and the adhesion molecule VCAM. Discernible levels 
of macrophage chemotacTic protein 1 (MCP-1) MIF and 
R ANTES were also noted. IBD samples were in comparison 
rather subdued although IL-1 converting enzyme (ICE) 
TIMP-1, and MIF were notable in all the three different IBD 
samples examined here. In IBD-A, one of three individual 
samples, ICE, VCAM, Groa, and MMP expression was more 
pronounced than in the others. 

We also made use of a peripheral blood cDNA library (3) 
to identify genes expressed by lymphocytes infiltrating the 
intlamed tissues from the circulating blood. With the 1046- 
element array of randomly selected cDNAs from this library 
probes made from R A and IBD samples showed hybridizations 
to a large number of genes. Of these, many were common 
between the two disease tissues while others were differentially 
expressed (data not shown). A complete survey of these genes 
was beyond the scope of this study, but for this report we 
picked three genes that were up-regulated in the RA tissue 
relative to IBD. These cDNAs were sequenced and identified 
by comparison to the GenBank database. They are TIMP-1 

?x P ,° f c^ n Iight Chain ' and mam J a nese superoxide dismutase 
(MnSOD). Differential expression of MnSOD was only ob- 
served in samples of RA tissue explants maintained in growth 
medium without serum for anywhere between 2 to 16 hr These 
results also indicate that the expression profile of genes can be 
altered when explants are transferred to culture conditions. 

DISCUSSION 

The speed, ease, and feasibility of simultaneously monitoring 
differential expression of hundreds of genes with the cDNA 
microarray based system (1-3) is demonstrated here in the 
analysis of a complex disease such as RA. Many different cell 
types in the RA tissue; macrophages, lymphocytes, plasma cells 
neutrophils, synoviocytes, chondrocytes, etc. are known to con- 
tribute to the development of the disease with the expression of 
gene products known to be proinflammatory. They include the 
cytokines, chemokines, growth factors, MMPs, eicosanoids, and 
others (7, 11-14), and the design of the 96-element known gene 
microarray was based on this knowledge and depended on the 
availability of the genes. The technology was validated by con- 
firming earlier observations on the expression of TNF by the 
monocyte cell line MM6, and of Col-1 and CoI-3 expression in the 
chondrosarcoma cells and articular chondrocytes (9, 12) In our 
time-dependent survey the chronological order of gene activities 
in and between gene families was compared and the results have 
provided unprecedented profiles of the cytokines (TNF IL-1 
IL-6, GCSF, and MIF), chemokines (MIP-lo, MIP-1/3, IL-8, and 
Gro-1), certain transcription factors, and the matrix metal- 
loproteinases (GelA, Strom-1, Col-1, Col-3, HME) in the mac- 
rophage cell line MM6 and in the SW1353 chondrosarcoma cells. 

Earlier reports of cytokine production in the diseased state had 
established a model in which TNF is a major participant in R A 
Its expression reportedly preceded that of the other cytokines and 
effector molecules (4). Our results strongly support these results 
as demonstrated in the time course of the MM6 cells where TNF 
induction preceded that of IL-la and IL-/3 followed by IL-6 and 
GCSF. These expression profiles demonstrate the utility of the 
microarrays in determining the hierarachy of signaling events. 

Inthe SW1353 chondrosarcoma cells, all the known MMPs and 
TIMPs were examined simultaneously. HME expression was 
discovered, which previously had been observed in only the 
stromal cells and alveolar macrophages of smoker's lungs and in 
placental tissue. Its presence in cells of the RA tissue is mean- 
ingful because its activity can cause significant destruction of 
elastin and basement membrane components (16, 17). Expression 
profiles of synovial fibroblasts and articular chondrocytes were 
remarkably similar and not too different from the SW1353 cells 
indicating that the fibroblast and the chondrocyte can play equally 
aggressive roles in joint erosion. Prominent genes expressed were 
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the MMPs, but chemokines and cytokines were also produced by 
- - these-cey&-The-effect of the anabolic-growth factor TGF-/3 was " 
profoundly evident in demonstrating the down regulation of these 
catabolic activities. 

RA tissue samples undeniably reflected profiles similar to 
the cell types examined. Active genes observed were IL-3, IL-6, 
ICE, the MMPs including HME and TIMPs, chemokines IL-8,' 
Groa, MIP, MIF, and RANTES, and the adhesion molecule 
VCAM. Of the growth factors, fibroblast growth factor p was 
observed most frequently. In comparison, the expression 
patterns in the other inflammatory state (i.e., IBD) were not 
as marked as in the RA samples, at least as obtained from the 
tissue samples selected for this study. 

As an alternative approach, the 1046 cDNA microarray of 
randomly selected genes from a lymphocyte library was used to 
identify genes expressed in RA tissue (3). Many genes on this 
array hybridized with probes made from both R A and IBD tissue 
samples. The results are not surprising because inflammatory 
tissue is abundantly supplied with cell types infiltrating from the 
circulating blood, made apparent also by the high levels of 
chemokine expression in RA tissue. Because of the magnitude of 
the effort required to identify all the hybridized genes, we have for 
this report chosen to describe only three differentially expressed 
genes mainly to verify this method of analysis. 

Of the large number of genes observed here, a fair number 
were already known as active participants in inflammatory dis- 
ease. These are TNF, IL-1, IL-6, IL-8, GCSF, RANTES, and 
VCAM. The novel participants not previously reported are 
HME, IL-3, ICE, and Groa. With our discovery of HME 
expression in RA, this gene becomes a target for drug interven- 
tion. ICE is a cysteine protease well known for its IL-1/3 process- 
ing activity (18), and recognized for its role in apoptotic cell death 
(19). Its expression in RA tissue is intriguing. IL-3 is recognized 
for its growth-promoting activity in hematopoietic cell lineages, is 
a product of activated T cells (20), and its expression in synovio- 
cytes and chondrocytes of RA tissue is a novel observation. 

Like IL-8, Groa, is a C-X-C subgroup chemokine and is a 
potent neutrophil and basophil chemoattractant. It down- 
regulates the expression of types I and III interstitial collagens 
(21, 22) and is seen here produced by the MM6 cells, in primary 
synoviocytes, and in RA tissue. With the presence of RANTES 
MCP, and MIP-10, the C-C chemokines (23) migration and 
infiltration of monocytes, particularly T cells, into the tissue is 
also enhanced (5) and aid in the trafficking and recruitment of 
leukocytes into the RA tissue. Their activation, phagocytosis, 
degranulation, and respiratory bursts could be responsible for 
the induction of MnSOD in RA. MnSOD is also induced by 
TNF and IL-1 and serves a protective function against oxida- 
tive damage. The induction of the ferritin light chain encoding 
gene in this tissue may be for reasons similar to those for 
MnSOD. Ferritin is the major intracellular iron storage protein 
and it is responsive to intracellular oxidative stress and reactive 
oxygen intermediates generated during inflammation (24, 25). 
The active expression of TIMP-1 in RA tissue, as detected by 
the 1000-element array, is no surprise because our results have 
repeatedly shown TIMP-1 to be expressed in the constitutive 
and induced states of RA cells and tissues. 

The suitability of the cDNA microarray technology for 
profiling diseases and for identifying disease related genes is 
well documented here. This technology could provide new 



targets for drug development and disease therapies, and in 
doing so allow for improved treatment of chronic diseases that 
are challenging because oftheir complexity. 
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MEASUREMENT OF GENE EXPRESSION PROFILES 
IN TOXICITY DETERMINATION 

5 Field of the Invention 

The invention relates generally to methods for detecting and monitoring 
phenotypic changes in in vitro and in vivo systems for assessing and/or determining 
the toxicity of chemical compounds, and more particularly, the invention relates to a 
method for detecting and monitoring changes in gene expression patterns in in vitro 
10 and in vivo systems for determining the toxicity of drug candidates. 

BACKGROUND 

The ability to rapidly and conveniently assess the toxicity of new compounds 
is extremely important. Thousands of new compounds are synthesized every year, 
1 5 and many are introduced to the environment through the development of new 

commercial products and processes, often with little knowledge of their short term 
and long term health effects. In the development of new drugs, the cost of assessing 
the safety and efficacy of candidate compounds is becoming astronomical: It is 
.estimated that the pharmaceutical industry spends an average of about 300 million 
20 dollars to bring a new pharmaceutical compound to market, e.g. Biotechnology, 1 3 : 
226-228 (1995). A large fraction of these costs are due to the failure of candidate 
compounds in the later stages of the developmental process. That is. as the 
assessment of a candidate drug progresses from the identification of a compound as a 
drug candidate-for example, through relatively inexpensive binding assays or in vitro 
25 screening assays, to pharmacokinetic studies, to toxicity studies, to efficacy studies in 
model systems, to preliminary clinical studies, and so on, the costs of the associated 
tests and analyses increases tremendously. Consequently, it may cost several tens of 
millions of dollars to determine that a once promising candidate compound possesses 
a side effect or cross reactivity that renders it commercially infeasible to develop 
30 further. A great challenge of pharmaceutical development is to remove from further 
consideration as early as possible those compounds that are likely to fail in the later 
stages of drug testing. 

Drug development program 3 are clearly structured with this objective in mind: 
however, rapidly escalating costs have created a need to develop even more stringent 
35 and less expensive screens in the early stages to identify false leads as soon as 

possible. Toxicity assessment is an area where such improvements may be made, for 
both drug development and for assessing the environmental, health, and safety effects 
of new compounds in general. 
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... Typicallyihe toxicity of a compound is determined by administering the 

compound to one or more species of test animal under controlled conditions and by 
momtonng the effects on a wide range of parameters. The parameters include such 
things as blood chemistry, weight gain or loss, a variety of behavioral patterns, muscle 
5 tone, body temperature, respiration rate, lethality, and the like, which collectively 
prov.de a measure of the state of health of the test animal. The degree of deviation of 
such parameters from their normal ranges gives a measure of the toxicity of a 
compound. Such tests may be designed to assess the acute, prolonged, or chronic 
tweeny of a compound. In general, acute tests involve administration of the test 
0 chermcal on one occasion. The period of observation of the test animals may be as 
short as a few hours, although it is usually at least 24 hours and in some cases it may 
be as long as a week or more. In general, prolonged tests involve administration of 
the test chemical on multiple occasions. The test chemical may be administered one 
or more tunes each day, irregularly as when it is incorporated in the diet, at specific 
times such as during pregnancy, or in some cases regularly but only at weekly 
intervals. Also, in the prolonged test the experiment is usually conducted for not less 
than 90 days in the rat or mouse or a year in the dog. In contrast to the acute and 
prolonged types of test, the chronic toxicity tests are those in which the test chemical 
. is administered for a substantial portion of the lifetime of the test animal. In the case 
of the mouse or rat, this is a period of 2 to 3 years. In the case of the dog, i, is for 5 to 
7 years. 

Significant costs are incurred in establishing and maintaining large cohorts of 
test animals for such assays, especially die larger animals in chronic toxicity assays 
Moreover, because of species specific effects, passing such toxicity tests does not 
ensure that a compound is free of toxic effects when used in humans. Such tests do 
however, provide a standardized set of information forjudging the safety of new 
compounds, and they provide a database for giving preliminary assessments of related 
compounds. An important area for improving toxicity determination would be the 
identification of new observables which are predictive of the outcome of the 
30 expensive and tedious animal assays. 

In other medical fields, there has been significant interest in applying recent 
advances m biotechnology, particularly in DNA sequencing, to the identification and 
study of differentially expressed genes in healthy and diseased organisms, e g Adams 
et al. Science, 252: 1651-1656 (1991); Matsubara et al, Gene. 135: 265-274 (1993) 
35 Rosenberg et al, International patent application, PCT/US95/01863. The objectives 
of such applications include increasing our knowledge of disease processes 
identifying genes that play important roles in the disease process, and providing 
diagnostic and therapeutic approaches that exploit the expressed genes or their 
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products. WKre such approaches are attractive, those based oruexhaustive, or even 
sampled, sequencing of.expressed genes are still beset by the enormous effort 
required: It is estimated that 30-35 thousand different genes are expressed in a typical 
mammalian tissue in any given state, e.g. Ausubel et al, Editors, Current Protocols. 
5.8.1-5.8.4 (John Wiley & Sons, New York, 1992). Determining the sequences of 
even a small sample of that number of gene products is a major enterprise, requiring 
industrial-scale resources. Thus, the routine application of massive sequencing of 
expressed genes is still beyond current commercial technology. 

The availability of new assays for assessing the toxicity of compounds, such 
as candidate drugs, that would provide more comprehensive and precise information 
about the state of health of a test animal would be highly desirable. Such additional 
assays would preferably be less expensive, more rapid, and more convenient than 
current testing procedures, and would at the same time provide enough information to 
make early judgments regarding the safety of new compounds. 

Summary of the Invention 
An object of the invention is to provide a new approach to toxicity assessment 
based on an examination of gene expression patterns, or profiles, in in vitro or in vivo 
- test systems. 

Another object of the invention is to provide a database on which to base 
decisions concerning the toxicological properties of chemicals, particularlv drug 
candidates. 

A further object of the invention is to provide a method for analyzing gene 
expression patterns in selected tissues of test animals. 

A still further object of the invention is to provide a system for identifying 
genes which are differentially expressed in response to exposure to a test compound: 
Another object of the invention is to provide a rapid and reliable method for 
correlating gene expression with short term and long term toxicity in test animals. 

Another object of the invention is to identify genes whose expression is 
predictive of deleterious toxicity. 

The invention achieves these and other objects by providing a method for 
mass,vely parallel signature sequencing of genes expressed in one or more selected 
tissues of an organism exposed to a test compound. An important feature of the 
invention is the application of novel DNA sorting and sequencing methodologies that 
permit the formation of gene expression profiles for selected tissues by determining 
the sequence of portions of many thousands of different polynucleotides in parallel 
Such profiles may be compared with those from tissues of control organisms at single 
or multiple time points to identify expression patterns predictive of toxicity. 
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~ : • - methodology of the invention' makes use of oligonucleotide tags 

that are members of a minimally cross-hybridizing set of oligonucleotides The 
sequences of oligonucleotides of such a set differ from the sequences of every other 
_ member of the same set by at least two nucleotides. Thus, each member of such a set 
cannot form a duplex (or triplex) with the complement of any other member with less 
than two rmsmatches. Complements of oligonucleotide tags of the invention, referred 
to teem as '"tag complements," may comprise natural nucleotides or non-natural 
nuc eottde analog, Preferably, tag complements are attached to solid phase supports 
Such ohgonucleot.de tags when used with their corresponding tag complements 
sXcDNaT SPCCifiCi,y ° f HybridiZati0n «* **** ^nucleotides, 

The polynucleotides to be sorted each have an oligonucleotide tag attached, 
such that different polynucleotides have different tags. As explained more fully 
below, this condition is achieved by employing a repertoire of tags substantially 
1 3 greater than the population of polynucleotides and by taking a sufficiently small 
sample of tagged polynucleotides from the full ensemble of tagged polynucleotides 
After such samphng, when the populations of supports and polynucleotides are mixed 
under condmons which permit specific hybridization of the oligonucleotide tags w.th 
-the,r respect.ve complements, identical polynucleotides sort onto particular beads or 
regions. The sorted populations of polynucleotides can then be sequenced on the 
sohd phase support by a "single-base" or »base-by-base" sequencing methodology, as 
described more fully below. 

In one aspect, the method of the invention comprises the following steps (a) 
administering the compound to a test organism; (b) extracting a populate of mRNA 
molecules from each of one or more tissues of the test organism; (c) forming a 
separate population of cDNA molecules from each population of mRNA molecules 
extracted from the one or more tissues such that each cDNA molecule of the separate 
populations has an oligonucleotide tag attached, the oligonucleotide tags being 
selected from the same minimally cross-hybridizing se,; (d) separately sampling each 
populanon of cDNA molecules such that substantially all different cDNA mo.ecules 
wtthm a separate population have different oligonucleotide tags attached; (e) sorting 
the cDNA molecules of each separate population by specifically hybridizing the 
ohgonucleotide tags with their respective complements, the respective complements 
being attached as uniform populations of substantially identical complements in 
spattally dtscrete regions on one or more solid phase supports; (f) determining the 
nucleotide sequence of a portion of each of the sorted cDNA molecules of each 
separate population to form a frequency distribution of expressed genes for each of 
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" the one of more lissuesf and (g) correlating the frequency distribution of expressed 
genes in each of the one or more tissues with the toxicity of the compound. 

An important aspect of the invention is the identification of genes whose 
expression is predictive of the toxicity of a compound. Once such genes are 
identified, they may be employed in conventional assays, such as reverse transcriptase 
polymerase chain reaction (RT-PCR) assays for gene expression. 

Brief Description of the Drawings 
Figure 1 is a flow chart representation of an algorithm for generating 

minimally cross-hybridizing sets of oligonucleotides. 

Figure 2 diagrammatically illustrates an apparatus for carrying out 

polynucleotide sequencing in accordance with the invention. 

Definitions 

"Complement" or "tag complement" as used herein in reference to 
oligonucleotide tags refers to an oligonucleotide to which a oligonucleotide tag 
specifically hybridizes to form a perfectly matched duplex or triplex. In embodiments 
where specific hybridization results in a triplex, the oligonucleotide tag may be 
.selected to be either double stranded or single stranded. Thus, where triplexes are 
formed, the term "complement" is meant to encompass either a double stranded 
complement of a single stranded oligonucleotide tag or a single stranded complement 
of a double stranded oligonucleotide tag. 

The term "oligonucleotide" as used herein includes linear oligomers of natural 
or modified monomers or linkages, including deoxyribonucleosides, ribonucleosides, 
anomeric forms thereof, peptide nucleic acids (PNAs), and the like, capable of 
specifically binding to a target polynucleotide by way of a regular pattern of 
monomer-to-monomer interactions, such as Watson-Crick type of base pairing, base 
stacking, Hoogsteen or reverse Hoogsteen types of base pairing, or the like. Usually 
monomers are linked by phosphodiester bonds or analogs thereof to form 
oligonucleotides ranging in size from a few monomelic units, e.g. 3-4, to several tens 
of monomelic units. Whenever an oligonucleotide is represented by a sequence of 
letters, such as "ATGCCTG," it will be understood that the nucleotides are in 5'->3 f 
order from left to right and that "A" denotes deoxyadenosine, "C" denotes 
deoxycytidine, "G" denotes deoxyguanosine, and "T" denotes thymidine, unless 
otherwise noted. Analogs of phosphodiester linkages include phosphorothioate, 
phosphorodithioate, phosphoranilidate, phosphoramidate, and the like. Usually 
oligonucleotides of the invention comprise the four natural nucleotides; however, they 
may also comprise non-natural nucleotide analogs. It is clear to those skilled in the 
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... art when oligonucleotides having natural or non-Hatural nucleotides may be 
employed, e.g. where processing by enzymes is called for, usualTy oligonucleotides 
consisting of natural nucleotides are required. 

"Perfectly matched" in reference to a duplex means that the poly- or 
• oligonucleotide strands making up the duplex form a double stranded structure with 
one other such that every nucleotide in each strand undergoes Watson-Crick 
basepairing with a nucleotide in the other strand. The term also comprehends the 
pamng of nucleoside analogs, such as deoxyinosine, nucleosides with 2-amino P urine 
bases and the like, that may be employed. In reference to a triplex, the term means 
that the triplex consists of a perfectly matched duplex and a third strand in which 
every nucleotide undergoes Hoogsteen or reverse Hoogsteen association with a 
basepair of the perfectly matched duplex. Conversely, a "mismatch" in a duplex 
between a tag and an oligonucleotide means that a pair or triplet of nucleotides in the 
duplex or triplex fails to undergo Watson-Crick and/or Hoogsteen and/or reverse 
Hoogsteen bonding. 

As used herein, "nucleoside" includes the natural nucleosides, including 2'- 
deoxy and 2 , -hydroxyl forms, e.g. as described in Kornberg and Baker DNA 
Replication, 2nd Ed. (Freeman, San Francisco. 1992). "Analogs" in reference to 
.nucleosides includes synthetic nucleosides having modified base moieties and/or 
modified sugar moieties, e.g. described by Scheit, Nucleotide Analogs (John Wiley 
New York, 1980); Uhlman and Peyman, Chemical Reviews, 90: 543-584 (1990) or 
the hke. with the only proviso that they are capable of specific hybridization Such 
analogs include synthetic nucleosides designed to enhance binding properties, reduce 
complexity, increase specificity, and the like. 

As used herein "sequence determination" or "determining a nucleotide 
sequence" in reference to polynucleotides includes determination of partial as well as 
full sequence information of the polynucleotide. That is. the term includes sequence 
comparisons, fingerprinting, and like levels of information about a target 
polynucleotide, as well as the express identification and ordering of nucleosides 
usually each nucleoside, in a target polynucleotide. The term also includes the ' 
determmation of the identification, ordering, and locations of one. two, or three of the 
four types of nucleotides within a target polynucleotide. For example, in some 
embodiments sequence determination may be effected by identifying the ordering and 

^rrrr a . Sin8 l e ° f nUde ° tide - Withi " the *™« Polynucleotide 

cailvjL ... so that its sequence is represented as a binary code, e.g "100101 " for 

"C-(not CMnot C)-C-(not C>C ... " and the like. 
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AsusedTferein, the term "complexity" in reference to a population of 
polynucleotides means .the number of different species of molecule present in the 
population. 

As used herein, the terms "gene expression profile," and "gene expression 
5 pattern" which is used equivalently, means a frequency distribution of sequences of 
portions of cDNA molecules sampled from a population of tag-cDNA conjugates. 
Generally, the portions of sequence are sufficiently long to uniquely identify the 
cDNA from which the portion arose. Preferably, the total number of sequences 
determined is at least 1000; more preferably, the total number of sequences 
1 0 determined in a gene expression profile is at least ten thousand. 

As used herein, "test organism" means any in vitro or in vivo system which 
provides measureable responses to exposure to test compounds. Typically, test 
organisms may be mammalian cell cultures, particularly of specific tissues, such as 
hepatocytes, neurons, kidney cells, colony forming cells, or the like, or test organisms 
1 5 may be whole animals, such as rats, mice, hamsters, guinea pigs, dogs, cats, rabbits, 
pigs, monkeys, and the like. 

Detailed Description of the Invention 
The invention provides a method for determining the toxicity of a compound 
20 by analyzing changes in the gene expression profiles in selected tissues of test 
organisms exposed to the compound. The invention also provides a method of 
identifying toxicity markers consisting of individual genes or a group of genes that is 
expressed acutely and which is correlated with prolonged or chronic toxicity, or 
suggests that the compound will have an undesirable cross reactivity. Gene 
25 expression profiles are generated by sequencing portions of cDNA molecules 
construction from mRNA extracted from tissues of test organisms exposed to the 
compound being tested. As used herein, the term "tissue" is employed with its usual 
medical or biological meaning, except that in reference to an in vitro test system, such 
as a cell culture, it simply means a sample from the culture. Gene expression profiles 
30 derived from test organisms are compared to gene expression profiles derived from 
control organisms to determine the genes which are differentially expressed in the test 
organism because of exposure to the compound being tested. In both cases, the 
sequence information of the gene expression profiles is obtained by massively parallel 
signature sequencing of cDNAs, which is implemented in steps (c) through (f) of the 
35 above method. 

Toxicity Assessment 
Procedures for designing and conducting toxicity tests in in vitro and in vivo 
systems is well known, and is described in many texts on the subject, such as Loomis 
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""" F^ h LO r i t £Mtt, " idS ° f T ° XiCOl0gy ' 4 * New Y„ rk 1M6 , 
Echobichon, T* Basics of Toxica Testing (CRC fress , Boca Raton, ,o£™ 
eduor. „ V„ro Wi* Testing (Ma™, Delcke,, New York, , md Jl,^ 

.««). S,„ce ,n most cases, the extraction of tissue as calied fo, in the method of the 
— reoutres sacrificing the ,es, ^ « fc cowro| 

for selectal? * '° XiCi,y S ""' ) '' ""'"^ " *• *™« 

for selecting the appropnate test organism for Ure compound being tested route of 

^.ntstratton. dose ranges, and the lik e. Wa,er o, physiologica, Line 

> -»» ts the solute of choice for the tes, compound since Lse soivenl pelT 

admuustratton by a variety of routes. When this is no, possibie because of oTb L 

■canons „ „ necessary to resor, to me use of vegetabie „Us such as com oi lo 

even organ* solvents, of which propyiene glycol is commonly used. Whene er 

» me use of suspension of emulsion should be avoided except for „T 

» adm,n,stratio„ Regard.ess of the route of admimstration, the volul reouTr d to 

^,„,s«r a gtven dose is limited by the si* of me animai tha, is used. ■ is de L b , e 

-tap -he vo.ume of each dose uniform wimin and between groups of at^T 

exceed 0.005 ml per gram of animal. Even when aqueous or physiological saline 

atooughsuchsolutionsareordtarilythoughtofasbemginnocuous The 
m^venous LD 50 of distilled water in me mouse is approxima,ely 0.044 m, per gram 
and tha, of ,so,on,c saline is 0.068 ml per gram of mouse 

When a compound is ,o be administered by inhalation, special ,echnioue s for 
generating « a,mospheres are necessary. Dose estimation becomes ve" 
comply The memoes usually involve aerosolization or nebuhzation of fluids 
contatnmg the compound. tt ,he agent ,o be tested is a fl u ,d tha, has an app ec e 

r: f rirr condi " ons - under *- condi,io " ! - d °* is — *» - 

volume of a, mhaled per urn, ,,me, the .emperature of the solution, and the vaoor 
pressure of me agent involved. Oases are metered from reservoirs. W en p^L of 

are to be administered, unless the partide si. is ,ess man about ^ h 
pamc.es w,l, no, reach the terminal alveolar sacs in the tangs. A variety of 
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~ "apparatuses and" Chambers are available to perform studies for detecting effects of 
irritant or other toxic endpoints when they are administered by inhalation. The 
preferred method of administering an agent to animals is via the oral route, either by 
intubation or by incorporating the agent in the feed. 

Preferably, in designing a toxicity assessment, two or more species should be 
employed that handle the test compound as similarly to man as possible in terms of 
metabolism, absorption, excretion, tissue storage, and the like. Preferably, multiple 
doses or regimens at different concentrations should be employed to establish a dose- 
response relationship with respect to toxic effects. And preferably, the route of 
administration to the test animal should be the same as, or as similar as possible to. 
the route of administration of the compound to man. Effects obtained by one route of 
administration to test animals are not a priori applicable to effects by another route of 
administration to man. For example, food additives for man should be tested by 
admixture of the material in the diet of the test animals. 

Acute toxicity tests consist of administering a compound to test organisms on 
one occasion. The purpose of such test is to determine the symptomotology 
consequent to administration of the compound and to determine the degree of lethality 
of the compound. The initial procedure is to perform a series of range-finding doses 
-of the compound in a single species. This necessitates selection of a route of 
administration, preparation of the compound in a form suitable for administration by 
the selected route, and selection of an appropriate species. Preferably, initial acute 
toxicity studies are performed on either rats or mice because of their low cost, their 
availability, and the availability of abundant toxicologic reference data on these 
species. Prolonged toxicity tests consist of administering a compound to test 
organisms repeatedly, usually on a daily basis, over a period of 3 to 4 months. Two 
practical factors are encountered that place constraints on the design of such tests: 
First, the available routes of administration are limited because the route selected 
must be suitable for repeated administration without inducing harmful effects. And 
second, blood,, urine, and perhaps other samples, should be taken repeatedly without 
inducing significant harm to the test animals. Preferably, in the method of the 
invention the gene expression profiles are obtained in conjunction with the 
measurement of the traditional toxicologic parameters, such as listed in the table 
below: 
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Hematolog y 



Blood Chemistry 
sodium 



Urine Analyses 



erythrocyte count 
total leukocyte count 
differential leukocyte 
count 
hematocrit 
hemoglobin 



potassium 
chloride 

calcium 

carbon dioxide 

serum glutamine-pyruvate 

transaminase 

serum glutamin-oxalacetic 

transaminase 

serum protein 

electrophoresis 

blood sugar 

blood urea nitrogen 

total serum protein 

serum albumin 

total serum bilirubin 



pH 

specific gravity 
total protein 

sediment 

glucose 

ketones 

bilirubin 



Oligonucleot ide Tags and Tag Comp lement* 
Oligonucleotide tags are members of a minimally cross-hybridizing set of 
oligonucleotides. The sequences of oligonucleotides of such a set differ from the 
sequences of every other member of the same set by at least two nucleotides Thus 
each member of such a set cannot form a duplex (or triplex) with the complement of 
any other member with less than two mismatches. Complements of oligonucleotide 
tags, referred to herein as -tag complements;' may comprise natural nucleotides or 
non-natural nucleotide analogs. Preferably, tag complements are attached to solid 
phase supports. Such oligonucleotide tags when used with their corresponding tag 
complements provide a means of enhancing specificity of hybridization for sorting, 
tracking, or labeling molecules, especially polynucleotides. 

Minimally cross-hybridizing sets of oligonucleotide tags and tag complements 
may be synthesized either combinatorial^ or individually depending on the size of the 
set desired and the degree to which cross-hybridization is sought to be minimized (or 
stated another way, the degree to which specificity is sought to be enhanced). For 
example, a minimally cross-hybridizing set may consist of a set of individually 
synthesized 10-mer sequences that differ from each other bv at least 4 nucleotides 
such set having a maximum size of 332 (when composed of 3 kinds of nucleotides 
and counted using a computer program such as disclosed in Appendix Ic). 
Alternatively, a minimally cross-hybridizing set of oligonucleotide tags may also be 
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assembled combinatorial ly from subunits which themselves are_selected from a 
minimally cross-hybridizing set. For example, a set of minimally cross-hybridizing 
I2-mers differing from one another by at least three nucleotides may be synthesized 
by assembling 3 subunits selected from a set of minimally cross-hybridizing 4-mers 
that each differ from one another by three nucleotides. Such an embodiment gives a 
maximally sized set of 9 3 , or 729, 12-mers. The number 9 is number of 
oligonucleotides listed by the computer program of Appendix la, which assumes, as 
with the 10-mers, that only 3 of the 4 different types of nucleotides are used. The set 
is described as "maximal" because the computer programs of Appendices Ia-c provide 
the largest set for a given input (e.g. length, composition, difference in number of 
nucleotides between members). Additional minimally cross-hybridizing sets may be 
formed from subsets of such calculated sets. 

Oligonucleotide tags may be single stranded and be designed for specific 
hybridization to single stranded tag complements by duplex formation or for specific 
hybridization to double stranded tag complements by triplex formation. 
Oligonucleotide tags may also be double stranded and be designed for specific 
hybridization to single stranded tag complements by triplex formation. 

When synthesized combinatorially, an oligonucleotide tag preferably consists 
-of a plurality of subunits, each subunit consisting of an oligonucleotide of 3 to 9 
nucleotides in length wherein each subunit is selected from the same minimally cross- 
hybridizing set. In such embodiments, the number of oligonucleotide tags available 
depends on the number of subunits per tag and on the length of the subunits. The 
number is generally much less than the number of all possible sequences the length of 
the tag. which for a tag n nucleotides long would be 4 n . 

Complements of oligonucleotide tags attached to a solid phase support are 
used to sort polynucleotides from a mixture of polynucleotides each containing a tag. 
Complements of the oligonucleotide tags are synthesized on the surface of a solid 
phase support, such as a microscopic bead or a specific location on an array of 
synthesis locations on a single support, such that populations of identical sequences 
are produced in specific regions. That is, the surface of each support, in the case of a 
bead, or of each region, in the case of an array, is derivatized by only one type of 
complement which has a particular sequence. The population of such beads or regions 
contains a repertoire of complements with distinct sequences. As used herein in 
reference to oligonucleotide tags and tag complements, the term 'repertoire" means 
the set of minimally cross-hybridizing set of oligonucleotides that make up the tags in 
a particular embodiment or the corresponding set of tag complements. 

The polynucleotides to be sorted each have an oligonucleotide tag attached, 
such that different polynucleotides have different tags. As explained more fully 
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.. - below, this condition is^chieved by employing & repertoire of tags substantially 
greater than the population of polynucleotides and by taking a sufficiently small 
sample of tagged polynucleotides from the full ensemble of tagged polynucleotides 
Aaer such sampling, when the populations of supports and polynucleotides are mixed 
under condnions which permit specific hybridization of the oligonucleotide tags with 
their respective complements, identical polynucleotides sort onto particular beads or 
regions. 

The nucleotide sequences of oligonucleotides of a minimally cross-hybridizing 
set are conveniently enumerated by simple computer programs, such as those 
exemphfied by programs whose source codes are listed in Appendices la and lb 
Program minhx of Appendix la computes all minimally cross-hybridizing sets having 
4-mer subunits composed of three kinds of nucleotides. Program tagN of Appendix 
lb enumerates longer oligonucleotides of a minimally cross-hybridizing set. Similar 
algonthms and computer programs are readily written for listing oligonucleotides of 
mimmally cross-hybridizing sets for any embodiment of the invention. Table I below 
provides guidance as to the size of sets of minimally cross-hybridizing 
oligonucleotides for the indicated lengths and number of nucleotide differences. The 
above computer programs were used to generate the numbers. 



Table I 



Nucleotide 
Difference 



Oligonuclcotid 
e 

Word 
Length 


between 
Oligonucleotides 
of Minimally 
Cross- 
Hybridizing Set 


Maximal Size 
of Minimally 

Cross- 
Hybridizing 
Set 


Size of 
Repertoire 
with Four 

Words 


Size of 
Repertoire with 
Five Words 


4 


3 


9 


6561 


5 90 x \0 A 


6 


3 


27 


5.3 x 1 0 5 


143 x I0 7 


7 


4 


27 


5.3 x I0 5 


143 x 10 7 


7 


5 


8 


4096 


3.28 x 10 4 


8 


3 


190 


1.30 x 10 9 


2.48 x 10 H 


8 


4 


62 


1.48 x 10 7 


9.l6x 10 8 


8 


5 


18 


1.05 x 10 5 


l.89x 10 6 


9 


5 


39 


2.31 x I0 6 


9.02 x I0 7 


10 


5 


332 


1.21 x 10 10 




10 


6 


28 


6.15 x 10 5 


1 72 x I0 7 


11 


5 


187 






18 


6 


*25000 
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For some embodiments of the invention, where extremely large repertoires of 
tags are not required, oligonucleotide tags of a minimally cross-hybridizing set may 
be separately synthesized. Sets containing several hundred to several thousands or 
5 even several tens of thousands, of oligonucleotides may be synthesized directly bv a 
variety of parallel synthesis approaches, e.g. as disclosed in Frank et al. U.S patent 
4 689,405; Frank et al, Nucleic Acids Research, 1 1 : 4365-4377 ( 1 983); Matson et al 
Anal. B,ochem.. 224: 110-116 (1995); Fodor et al, International application 
PCT/US93/04145; Pease et al, Proc. Natl. Acad. Sci., 91 : 5022-5026 (1994)- 
> Southern et al, J. Biotechnology, 35: 217-227 (1994), Brennan, International 

apphcation PCT/US94/05896; Lashkari et al, Proc. Natl. Acad. Sci 92- 79P-7915 
(1995); or the like. " ' 15 

Preferably, oligonucleotide tags of the invention are synthesized 
combinatorial* out of subunits between three and six nucleotides in length and 
selected from the same minimally cross-hybridizing set. For oligonucletides in this 
range, the members of such sets may be enumerated by computer programs based on 
the algorithm of Fig. 1. 

- The algorithm of Fig. 1 is implemented by first defining the characteristics of 
the subunits of the minimally cross-hybridizing set, i.e. length, number of base 
(Menaces between members, and composition, e.g. do they consist of two, three or 
four kinds of bases. A table M n , n=l, is generated (100) that consists of all possible 
sequences of a given length and composition. An initial subunit S, is selected and 
compared (120) with successive subunits Sj for i= n+ l to the end of the table 
Whenever a successive subunit has the reared number of mismatches to be a 
member of the mmimally cr OSS -hybrid,zin g set, it is saved m a new table M n+I ( P 5 ) 
that also contains subunits previously selected in prior passes through step ,20 Fo"r ' 
example, ,n the first set of comparisons, M 2 will contain S , ; in the second set of 
compansons, M 3 will contain S, and S 2 ; in the trnrd set of compansons. M 4 will 
contam Sj, S 2 , and S 3 ; and so on. Similarly, comparisons in table M; will be 
between Sj and all successive subunits in My Note that each successive table M n+I 
is smaller than lt s predecessors as subunits are eliminated in successive passes 
through step 130. After every subunit of table M n has been compared (140) the old 
table „ replaced by the new table M n+1 , and the next round of comparisons are 
begun. The process stops (1 60) when a table M n is reached that contains no 
successive subunits to compare to the selected subunit Sj. i.e. M n =M n+1 

Preferably, minimally cross-hybridizing sets comprise subunits that make 
approximately equivalent contributions to duplex stability as every other subunit in 
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„ .the set. In this .way, the -stability of perfectly matched duplexes between every subunit 
and its complement is approximately equal. Guidance for selecting such sets is 
provided by published techniques for selecting optimal PCR primers and calculating 
duplex stabilities, e.g. Rychlik et al, Nucleic Acids Research, 1 7: 8543-8551 (1989) 
and 18: 6409-6412 (1990); Breslauer et al, Proc. Natl. Acad. Sci., 83: 3746-3750 
(1986); Wetmur, Crit. Rev. Biochem. Mol. Biol., 26: 227-259 (1991);and the like. 
For shorter tags, e.g. about 30 nucleotides or less, the algorithm described by Rychlik 
and Wetmur is preferred, and for longer tags. e.g. about 30-35 nucleotides or greater, 
an algorithm disclosed by Suggs et al, pages 683-693 in Brown, editor, ICN-UCLA 
Symp. Dev. Biol., Vol. 23 (Academic Press, New York, 1981) may be'conveniently 
employed. Clearly, the are many approaches available to one skilled in the art for 
designing sets of minimally cross-hybridizing subunits within the scope of the 
invention. For example, to minimize the affects of different base-stacking energies of 
terminal nucleotides when subunits are assembled, subunits may be provided that 
have the same terminal nucleotides. In this way, when subunits are linked, the sum of 
the base-stacking energies of all the adjoining terminal nucleotides will be the same, 
thereby reducing or eliminating variability in tag melting temperatures. 

A -word" of terminal nucleotides, shown in italic below, may also be added to 
- each end of a tag so that a perfect match is always formed between it and a similar 
terminal "word" on any other tag complement. Such an augmented tag would have 
the form: 



w 


w, 


w 2 ... w,, 


w k 


w 


w 


wr 


W 2 ' ... W,,' 


w k - 


w 



where the primed W's indicate complements. With ends of tags always forming 
perfectly matched duplexes, all mismatched words will be internal mismatches 
thereby reducing the stability of tag-complement duplexes that otherwise would have 
mismatched words at their ends. It is well known that duplexes with internal 
mismatches are significantly less stable than duplexes with the same mismatch at a 
terminus. 

A preferred embodiment of minimally cross-hybridizing sets are those whose 
subunits are made up of three of the four natural nucleotides. As will be discussed 
more fully below, the absence of one type of nucleotide in the oligonucleotide tags 
permits target polynucleotides to be loaded onto solid phase supports by use of the 
5'->3' exonuclease activity of a DNA polymerase. The following is an exemplary 
minimally cross-hybridizing set of subunits each comprising four nucleotides selected 
from the group consisting of A, G, and T: 
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5 

Word: 
Sequence: 

Word: 
Sequence : 



In this set, each member would form a duplex having three mismatched bases with 
1 0 the complement of every other member. 

Further exemplary minimally cross-hybridizing sets are listed below in Table 
III. Clearly, additional sets can be generated by substituting different groups of 
'nucleotides, or by using subsets of known minimally cross-hybridizing sets. 

15 

Table III 

Exemplary Minimally Cross-Hvbridizing Sets of 4-mer Subunits 



Set 1 


Set 2 


Set 3 


Set 4 


Set 5 


Set 6 


CATT 


ACCC 


AAAC 


AAAG 


AACA 


AACG 


CTAA 


AGGG 


ACCA 


ACCA 


ACAC 


ACAA 


TCAT 


CACG 


AGGG 


AGGC 


AGGG 


AGGC 


ACTA 


CCGA 


CACG 


CACC 


CAAG 


CAAC 


TACA 


CGAC 


CCGC 


CCGG 


CCGC 


CCGG 


TTTC 


GAGC 


CGAA 


CGAA 


CGCA 


CGCA 


ATCT 


GCAG 


GAGA 


GAGA 


GAGA 


GAGA 


AAAC 


GGCA 


GCAG 


GCAC 


GCCG 


GCCC 




AAAA 


GGCC 


GGCG 


GGAC 


GGAG 



w, 
GATT 

w 5 
GTAA 



Table II 

w 2 
TGAT 

w 6 

AGTA 



w 3 
TAGA 

w 7 
ATGT 



w 4 
TTTG 

w 8 

AAAG 
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bet 8 


* A Ann * ■ 


AAGC 




ACAA 




AGCG 




CAAG 


CCCA 


CCCC 


CGGC 


CGGA 


GACC 


GACA 


GCGG 


GCGG 


GGAA 


GGAC 



Set 9 


Set J.0 


AAGG 


ACAG 


ACAA 


AACA 


AGCC 


AGGC 


CAAC 


CAAC 


CCCG 


CCGA 


CGGA 


CGCG 


GACA 


GAGG 


GCGC 


GCCC 


GGAG 


GGAA 



Set 11 


Set 12 


ACCG_ 


ACGA 


AAAA 


AAAC 


AGGC 


AGCG 


CACC 


CACA 


CCGA 


CCAG 


CGAG 


CGGC 


GAGG 


GAGG 


GCAC 


GCCC 


GGCA 


GGAA 



The oligonucleotide tags of the invention and their complements are 
conveniently synthesized on an automated DNA synthesizer, e.g. an Applied 
B.osystems, Inc. (Foster City, California) model 392 or 394 DNA/RNA Synthesizer 
usmg standard chemistries, such as phosphoramidite chemistry, e.g. disclosed in the 
following references: Beaucage and Iyer, Tetrahedron. 48: 2223-23 11(1 992) Molko 
et al, U.S. patent 4,980,460; Koster et al, U.S. patent 4,725,677; Caruthers et al U S 
patents 4,415,732; 4,458,066; and 4,973,679; and the like. Alternative chemistries ' 
e.g. resulting in non-natural backbone groups, such as phosphorothioate 
phosphoramidate, and the like, may also be employed provided that the resulting 
.oligonucleotides are capable of specific hybridization. In some embodiments tags 
may comprise naturally occurring nucleotides that permit processing or manipulation 
by enzymes, while the corresponding tag complements may comprise non-natural 
nucleotide analogs, such as peptide nucleic acids, or like compounds, that promote the 
formation of more stable duplexes during sorting. 

When microparticles are used as supports, repertoires of oligonucleotide tags 
and tag complements may be generated by subunit-wise synthesis via "split and mix" 
techniques, e.g. as disclosed in Shortle et al. International patent application 
PCT/US93/03418orLyttleetal,Biotechniques, 19:274-280(1995). Briefly the 
basic unit of the synthesis is a subunit of the oligonucleotide tag. Preferably ' 
phosphoramidite chemistry is used and 3' phosphoramidite oligonucleotides 'are 
prepared for each subunit in a minimally cross-hybridizing set, e.g. for the set first 
listed above, there would be eight 4-mer 3'- P hosphoramidites. Synthesis proceeds as 
disclosed by Shortle et al or in direct analogy with the techniques employed to 
generate diverse oligonucleotide libraries using nucleoside monomers eg as 
disclosed in Telenius et al. Genomics, 13: 718-725 (1992); Welsh et al. Nucleic Acids 
Research, 19: 5275-5279 (1991); Grothues et al, Nucleic Acids Research. 21 : 1321- 
1322 (1993); Hartley, European patent application 90304496.4; Lam et al, Nature 
334: 82-84 (1991); Zuckerman et al, Int. J. Pept. Protein Research, 40 498-507 
(1992); and the like. Generally, these techniques simply call for the application of 



-16- 



WO 97/13877 



PCT/US96/16342 



" " mixtures of th¥ activated monomers to the growing oligonucleotide during the 
coupling steps. Preferably, oligonucleotide tags and tag complements are synthesized 
on a DNA synthesizer having a number of synthesis chambers which is greater than or 
equal to the number of different kinds of words used in the construction of the tags. 
5 That is, preferably there is a synthesis chamber corresponding to each type of word. 
In this embodiment, words are added nucleotide-by-nucleotide, such that if a word 
consists of five nucleotides there are five monomer couplings in each synthesis 
chamber. After a word is completely synthesized, the synthesis supports are removed 
from the chambers, mixed, and redistributed back to the chambers for the next cycle 
1 0 of word addition. This latter embodiment takes advantage of the high coupling yields 
of monomer addition, e.g. in phosphoramidite chemistries. 

Double stranded forms of tags may be made by separately synthesizing the 
complementary strands followed by mixing under conditions that permit duplex 
formation. Alternatively, double stranded tags may be formed by first synthesizing a 
5 single stranded repertoire linked to a known oligonucleotide sequence that serves as a 
primer binding site. The second strand is then synthesized by combining the single 
stranded repertoire with a primer and extending with a polymerase. This latter 
approach is described in Oliphant et al. Gene, 44: 177-183 (1986). Such duplex tags 
- may then be inserted into cloning vectors along with target polynucleotides for sorting 
0 and manipulation of the target polynucleotide in accordance with the invention. 

When tag complements are employed that are made up of nucleotides that 
have enhanced binding characteristics, such as PNAs or oligonucleotide N3'->P5" 
phosphoramidates, sorting can be implemented through the formation of D-Ioops 
between tags comprising natural nucleotides and their PNA or phosphoramidate 
5 complements, as an alternative to the ''stripping" reaction employing the 3'-»5' 
exonuclease activity of a DNA polymerase to render a tag single stranded. 

Oligonucleotide tags of the invention may range in length from 12 to 60 
nucleotides or basepairs. Preferably, oligonucleotide tags range in length from 1 8 to 
40 nucleotides or basepairs. More preferably, oligonucleotide tags range in length 
0 from 25 to 40 nucleotides or basepairs. In terms of preferred and more preferred 
numbers of subunits, these ranges may be expressed as follows: 



Table IV 

Numbers of Subunits in Tags in Preferred Embodiments 

5 

Monomers 

in Subunit Nucleotides in Oligonucleotide Tag 

(12-60) (18-40) (25-40) 
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. 3 . _ _ _ 4-20 subunits . 6-13 -subunits 

4 3-15 subunits 4- 1 0 subunits 

5 2-12 subunits 3-8 subunits 

6 2-10 subunits 3-6 subunits 



PCT/US96/16342 

8-13 subunits 
6-10 subunits 
5-8 subunits 
4-6 subunits 



Most preferably, oligonucleotide tags are single stranded and specific hybridization 
occurs via Watson-Crick pairing with a tag complement. 

Preferably, repertoires of single stranded oligonucleotide tags of the invention 

' ! 7 Z I"" mCmberS; ^ PreferaWy ' repert ° ireS ^ -gs contain at 
east 1 000 members; and most preferably, repertoires of such tags contain at least 
10,000 members. 

Triplex Ta ffs 

In embodiments where specific hybridization occurs via triplex formation 
codmg of tag sequences follows the same principles as for duplex-forming tags- ' 
however, there are further constraints on the selection of subunit sequences ' 
Generally, third strand association via Hoogsteen type of binding is most stable along 
homopyr^dme-homopurine tracks in a double stranded targe, Usually, base triplets 
form m T-A*T or C-G*C motifs (where --■ indicates Watson-Crick pairing and 
md.cates Hoogsteen type of binding); however, other motifs are also possible For 
example. Hoogsteen base pairing permits parallel and antiparallel orientations 
between the third strand (the Hoogsteen strand) and the purine-rich strand of the 
duplex to which the third strand binds, depending on conditions and the composition 
of the strands. There is extensive guidance in the literature for selecting appropriate 
sequences, orientation, conditions, nucleoside type (e.g. whether ribose or 
deoxynbose nucleosides are employed), base modifications (e.g. methylated cytosine 
and the hke) ,n order to maxima or otherwise regu.ate. triplex stability as desired in 
pamcular embodiments, e.g. Roberts et al. Proc. Natl. Acad. Sci 88 9397-9401 
(1991); Roberts et al, Science, 258: 1463-1466 (1992); Roberts et al, Proc Natl 
Acad. Sci.. 93: 432(M325 (1996); Distefano et al, Proc. Natl. Acad. Sci 90- 1 1 79 
1 183 (1993); Mergny et al. Biochemistry, 30: 9791-9798 (1991); Cheng et al J Am 

2lZ^n£ 4465 " 4474 (1 " 2): BCaI " d DerVan ' NuC,dc Acids Search. 20: ' 
2773-2776 (1992); Beal and Dervan, J. Am. Chem. Soc, 1 14: 4976-4982 (199">v 

G,ovannan g eli et al, Proc. Natl. Acad. Sci.. 89: 8631-8635 (1992); Moser and Dervan 

Scence, 238: 645-650 (1987); McShan et al, J. Biol. Chem.. 267:5712-5721 (1997)- ' 

Yoon et al, Proc. Natl. Acad. Sci., 89: 3840-3844 (1992); Blume et al, Nucleic Acids 

Research, 20: 1 777-. 784 (1992); Thuong and He.ene, Angew. Chem. Int. Ed Engl 
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32: 666--69CF (1997); Escude et al, Proc.'Natl. AcatT Sci., 93: 4365-4369 (1996); and 
the like. Conditions for annealing single-stranded or duplex tags to their single- 
stranded or duplex complements are well known, e.g. Ji et al, Anal. Chem. 65: 1323- 
1328 (1993); Cantor et al, U.S. patent 5,482,836; and the like. Use of triplex tags has 

5 the advantage of not requiring a "stripping" reaction with polymerase to expose the 
tag for annealing to its complement. 

Preferably, oligonucleotide tags of the invention employing triplex 
hybridization are double stranded DNA and the corresponding tag complements are 
single stranded. More preferably, 5-methylcytosine is used in place of cytosine in the 

0 tag complements in order to broaden the range of pH stability of the triplex formed 
between a tag and its complement. Preferred conditions for forming triplexes are 
fully disclosed in the above references. Briefly, hybridization takes place in 
concentrated salt solution, e.g. 1 .0 M NaCl, 1 .0 M potassium acetate, or the like, at 
pH below 5.5 ( or 6.5 if 5-methylcytosine is employed). Hybridization temperature 

5 depends on the length and composition of the tag; however, for an 1 8-20-mer tag of 
longer, hybridization at room temperature is adequate. Washes may be conducted 
with less concentrated salt solutions, e.g. 10 mM sodium acetate, 100 mM MgCl 2 , pH 
5.8, at room temperature. Tags may be eluted from their tag complements by 
- incubation in a similar salt solution at pH 9.0. 

0 Minimally cross-hybridizing sets of oligonucleotide tags that form triplexes 

may be generated by the computer program of Appendix Ic, or similar programs. An 
exemplary set of double stranded 8-mer words are listed below in capital letters with 
the corresponding complements in small letters. Each such word differs from each of 
the other words in the set by three base pairs. 

5 

Table V 

Exemplary Minimally Cross-Hybridizing 
Set of DoubleStranded 8-mer Tag s 





-AAGGAGAG 


5' 


-AAAGGGGA 


C t 


-AGAGAAGA 


c * 
u 


-AGGGGGGG 


3' 


-TTCCTCTC 


3' 


-TTTCCCCT 


3' 


-TC7CTTCT 


3 * 


-TCCCCCCC 


3' 


-ttcctctc 


3' 


-t ttcccct 


3' 


-tctct tct 


V 


-tccccccc 


c * 


-AAAAAAAA 


5' 


-AAGAGAGA 


£ * 


-AGGAAAAG 


5' 


-GAAAGGAG 


3' 




3' 


-TTCTCTCT 


3' 




3' 


-CTTTCCTC 


3' 


-tttctttt 


3' 


-tcctctct 


3' 


-tccrt t tc 


3' 


-ctttcc:c 


C t 


-AAAAAGGG 


5' 


-AGAAGAGG 


5' 


-AGGAAGGA 


5' 


-GAAGAAGG 


3' 


-TTTTTCCC 


3' 


-TCTTCTCC 


3' 


-TCCTTCCT 


3' 


-CTTCTTCC 


3' 


-tttttccc 


3' 


-tcttctcc 


3' 


-tccttcct 


3' 


-cttct ::: 


5' 


-AAAGGAAG 


5' 


-AGAAGGAA 


5 ' 


-AGGGGAAA 


5' 


-GAAGAGAA 


3' 


-TTTCCTTC 


3' 


-TCTTCCTT 


3' 


-TCCCCTTT 


3' 


-CTTCTCTT 


3' 


-rrtccttc 


3' 


-tcttcctt 


3' 


-tccccttt 


3' 


-cttctctt 
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Oligonucleotid 
c 

Word 
Length 



4 
6 
8 
10 
15 
20 
20 
20 



Table VI 

Repertoire Size of Various Double Stranded Tags 
That Form Triplexes with Th e j r Tag Cnmp \^ n t c 



Nucleotide 
Difference 
between 
Oligonucleotides 
of Minimally 
Cross- 
Hybridizing Set 



2 
3 
3 
5 
5 
6 
8 
10 



Maximal Size 
of Minimally 

Cross- 
Hybridizing 
Set 



8 
8 
16 
8 
92 
765 
92 
->■> 



Size of 
Repertoire 
with Four 

Words 



4096 
4096 
6.5 x 1 0 4 
4096 



Size of 
Repertoire with 
Five Words 



3.2 x 10* 
3.2 x 1 0 4 
1.05 x I0 6 



Preferably, repertoires of double stranded oligonucleotide tags of the invention 
contain at least 10 members; more preferably, repertoires of such tags contain at least 
100 members. Preferably, words are between 4 and 8 nucleotides in length for 
combinatorial^ synthesized double stranded oligonucletide tags, and oligonucleotide 
tags are between 12 and 60 base pairs in length. More preferably, such tags are 
between 18 and 40 base pairs in length. 



Solid Phase Supports 
Solid phase supports for use with the invention may have a wide variety of 
forms, including microparticles, beads, and membranes, slides, plates, micromachined 
chips, and the like. Likewise, solid phase supports of the invention may comprise a 
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15 



20 



25 



0 



Wid ' e ~ vari «y of compositions, including glass, pfastic, silicon, aJkanethiolate- 
denvanzed gold, cellulose, low cross-linked and high cross-linked polystyrene, silica 
gel, polyamide. and the like. Preferably, either a population of discrete particles are 
_ employed such that each has a uniform coating, or population, of complementary 
> sequences of the same tag (and no other), or a single or a few supports are emploved 
wtth spanally discrete regions each containing a uniform coating, or population of 
complementary sequences to the same tag (and no other). In the latter embodiment 
the area of the regions may vary according to particular applications; usually the 
reg.ons range in area from several um2, e.g. 3-5. to several hundred um2, e.g. ,00- 
) 500. Preferably, such regions are spatially discrete so that signals generated by 

events, e.g. fluorescent emissions, at adjacent regions can be resolved by the detection 
system bemg employed. In some applications, it may be desirable to have regions 
w«h unaform coatings of more than one tag complement, e.g. for simultaneous 
sequence analysis, or for bringing separately tagged molecules into close proximity 

Tag complements may be used with the solid phase support that thev are 
synthesized on, or they may be separately synthesized and attached to a solid phase 

10880 (1988); Albretsen et al, Anal. Biochem., 189: 40-50 (1990); Wolf et al Nucleic 

' ™r h ' ' ~ 2926 0987): " Gh ° Sh et * Nuc) - ^ids Research 15 

5353-5372 (1987). Preferably, tag complements are synthesized on and used with the 
same sohd phase support, which may comprise a variety of forms and include a 
vanety of hnking moieties. Such supports may comprise microparticles or arrays, or 
matnces, of re gl ons where uniform populations of tag complements are synthesized 
A w,de variety of microparticle supports may be used with the invention, including 
m,cropart,ces made of controlled pore glass (CPG), highly cross-linked polystyrene 
acrylic copolymers, cellulose, nylon, dextran, latex, polyacrolein, and the like ' " 

n C 4°7 1 TJTT* eXCmP,ary refCrenCeS: Meth " En2 *™'- Adages 
.1- 47, vol. 44 (Academic Press, New York. 1976); U.S. patents 4.678.814 

4,413,070; and 4,046;720; and Pon. Chapter 19, in Agrawal, editor. Methods in 

Molecular B.ology, Vol. 20, (Humana Press, Totowa, NJ, 1 993). Microparticle 

supports further include commercially available nucleoside-derivatized CPG and 

polystyrene beads (e.g. available from Applied Biosystems, Foster Citv CAV 

denvanzed magnetic beads; polystyrene grafted with polyethylene glycol (e g 
TentaGelTM, Rapp Polymere Tubingen ^ ^ ^ ^ J 

support characteristics, such as material, porosity, size, shape, and the like, and the 
*P of hnktng moaery employed depends on the conditions under which the tags are 
used. For example, in applications involving successive processing with enzymes 
supports and linkers that minimize steric hindrance of the enzymes and that facilitate 
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_ .access to substrate are preferred. Other important factors to be considered in selecting 
the most appropriate microparticle support include size uniformity, efficiency as a 
synthesis support, degree to which surface area known, and optical properties, e.g. as 
explain more fully below, clear smooth beads provide instrumentational advantages 
when handling large numbers of beads on a surface. 

Exemplary linking moieties for attaching and/or synthesizing tags on 
microparticle surfaces are disclosed in Pon et al, Biotechniques, 6:768-775 (1988); 
Webb, U.S. patent 4,659,774; Barany et al, International patent application 
PCT/US91/06103; Brown et al, J. Chem. Soc. Commun., 1989: 891-893; Damha et 
al. Nucleic Acids Research, 18: 3813-3821 (1990); Beattie et al, Clinical Chemistry, 
39: 719-722 (1993); Maskos and Southern, Nucleic Acids Research, 20- 1679-1684 
(1992); and the like. 

As mentioned above, tag complements may also be synthesized on a single 
(or a few) solid phase support to form an array of regions uniformly coated with tag 
complements. That is, within each region in such an array the same tag complement 
is synthesized. Techniques for synthesizing such arrays are disclosed in McGall et al, 
International application PCT/US93/03767; Pease et al, Proc. Natl. Acad. Sci., 91: 
5022-5026 ( 1 994); Southern and Maskos, International application 
-PCT/GB89/01 1 14; Maskos and Southern (cited above); Southern et al, Genomics, 13: 
1008-1017 (1992); and Maskos and Southern, Nucleic Acids Research 21 • 4663- 
4669(1993). 

Preferably, the invention is implemented with microparticles or beads 
uniformly coated with complements of the same tag sequence. Microparticle supports 
and methods of covalently or noncovalently linking oligonucleotides to their surfaces 
are well known, as exemplified by the following references: Beaucage and Iyer (cited 
above); Gait, editor. Oligonucleotide Synthesis: A Practical Approach (IRL Press, 
Oxford, 1984); and the references cited above. Generally, the size and shape of a 
microparticle is not critical; however, microparticles in the size range of a few. e.g. 1 - 
2, to several hundred, e.g. 200-1000 urn diameter are preferable, as they facilitate the 
construction and manipulation of large repertoires of oligonucleotide tags with 
minimal reagent and sample usage. 

In some preferred applications, commercially available controlled-pore glass 
(CPG) or polystyrene supports are employed as solid phase supports in the invention. 
Such supports come available with base-labile linkers and initial nucleosides attached, 
e.g. Applied Biosystems (Foster City, CA). Preferably, microparticles having pore 
size between 500 and 1000 angstroms are employed. 

In other preferred applications, non-porous microparticles are employed for 
their optical properties, which may be advantageously used when tracking large 
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numbers of microparticles on planar supportsrsuch as a microscope slide. 
Particularly preferred non-porous microparticles are the glycidal methacrylate (GMA) 
beads available from Bangs Laboratories (Carmel, IN). Such microparticles are 
useful in a variety of sizes and derivatized with a variety of linkage groups for 
5 synthesizing tags or tag complements. Preferably, for massively parallel 

manipulations of tagged microparticles, 5 urn diameter GMA beads are employed. 



10 



15 



Attaching Tags to Polynucleotides 
For Sorting onto Solid Phase Supports 
An important aspect of the invention is the sorting and attachment of a 
populations of polynucleotides, e.g. from a cDNA library, to microparticles or to 
separate regions on a solid phase support such that each microparticle or region has 
substantially only one kind of polynucleotide attached. This objective is 
accomplished by insuring that substantially all different polynucleotides have 
different tags attached. This condition, in turn, is brought about by taking a sample of 
- the full ensemble of tag-polynucleotide conjugates for analysis. (It is acceptable that 
20 identical polynucleotides have different tags, as it merely results in the same 

polynucleotide being operated on or analyzed twice in two different locations.) Such 
sampling can be carried out either overtly-for example, by taking a small volume 
from a larger mixture-after the tags have been attached to the polynucleotides, it can 
be carried out inherently as a secondary effect of the techniques used to process the 
25 polynucleotides and tags, or sampling can be carried out both overtly and as an 
inherent part of processing steps. 

Preferably, in constructing a cDNA library where substantially all different 
cDNAs have different tags, a tag repertoire is employed whose complexity, or number 
of distinct tags, greatly exceeds the total number of mRNAs extracted from a cell or 
0 tissue sample. Preferably, the complexity of the tag repertoire is at least 1 0 times that 
of the polynucleotide population; and more preferably, the complexity of the tag 
repertoire is at least 100 times that of the polynucleotide population. Below, a 
protocol is disclosed for cDNA library construction using a primer mixture that 
contains a full repertoire of exemplary 9-word tags. Such a mixture of tag-containing 
5 primers has a complexity of 8 9 , or about 1.34 x 10 8 . As indicated by Winslow et al, 
Nucleic Acids Research, 19: 3251-3253 (1991), mRNA for library construction can 
be extracted from as few as 10-100 mammalian cells. Since a single mammalian cell 
contains about 5 x 10 5 copies of mRNA molecules of about 3.4 x 10 4 different kinds. 
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- - . by.standard techniques one can isolate the mRNA from about 100 cells, or 

(theoretically) about 5 x 10 7 mRNA molecules. Comparing this" number to the 
complexity of the primer mixture shows that without any additional steps, and even 
assuming that mRNAs are converted into cDNAs with perfect efficiency (1% 
: efficiency or less is more accurate), the cDNA library construction protocol results in 
a population containing no more than 37% of the total number of different tags. That 
is, without any overt sampling step at all, the protocol inherently generates a sample 
that comprises 37%, or less, of the tag repertoire. The probability of obtaining a 
double under these conditions is about 5%, which is within the preferred range. With 
mRNA from 10 cells, the fraction of the tag repertoire sampled is reduced to only 
3.7%, even assuming that all the processing steps take place at 100% efficiency. In 
fact, the efficiencies of the processing steps for constructing cDNA libraries are very 
low, a "rule of thumb" being that good library should contain about 10 8 cDNA clones 
from mRNA extracted from 10 6 mammalian cells. 

Use of larger amounts of mRNA in the above protocol, or for larger amounts 
of polynucleotides in general, where the number of such molecules exceeds the 
complexity of the tag repertoire, a tag-polynucleotide conjugate mixture potentially 
contains every possible pairing of tags and types of mRNA or polynucleotide. In such 

- cases, overt sampling may be implemented by removing a sample volume after a 
serial dilution of the starting mixture of tag-polynucleotide conjugates. The amount 
of dilution required depends on the amount of starting material and the efficiencies of 
the processing steps, which are readily estimated. 

If mRNA were extracted from 10 6 cells (which would correspond to about 0.5 
ug of poly(Ar RNA), and if primers were present in about 10-100 fold concentration 
excess-as is called for in a typical protocol, e.g. Sambrook et al, Molecular Cloning 
Second Edition, page 8.61 [10 uL 1.8 kb mRNA at 1 mg/mL equals about 1.68x lO"'" 
moles and 10 M L 18-mer primer at 1 mg/mL equals about 1 .68 x 10" 9 moles], then the 
total number of tag-polynucleotide conjugates in a cDNA library would simply be 
equal to or less than the starting number of mRNAs. or about 5 x 10 1 1 vectors 
containing tag-polynucleotide conjugates-again this assumes that each step in cDNA 
construction-first strand synthesis, second strand synthesis, ligation into a vector- 
occurs with perfect efficiency, which is a very conservative estimate. The actual 
number is significantly less. 

Ifa sample of n tag-polynucleotide conjugates are randomly drawn from a 
reaction mixture~as could be effected by taking a sample volume, the probability of 
drawing conjugates having the same tag is described by the Poisson distribution, 
P(r)=e (X)'/r, where r is the number of conjugates having the same tag and *=np, 
where p is the probability of a given tag being selected. If n=10 6 and p=l/(l .34 x 
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10 8 J, then ?i=M746 and P(2)=2.76 x 10" 5 . Thus, a sample of one million molecules 
gives rise to an expected number of doubles well within the preferred range. Such a 
sample is readily obtained as follows: Assume that the 5 x 10 n mRNAs are perfectly 
converted into 5 x 10 n vectors with tag-cDNA conjugates as inserts and that the 5 x 
5 1 0 n vectors are in a reaction solution having a volume of 1 00 jil. Four 1 0-fold serial 
dilutions may be carried out by transferring 10 ^1 from the original solution into a 
vessel containing 90 jal of an appropriate buffer, such as TE. This process may be 
repeated for three additional dilutions to obtain a 100 ^1 solution containing 5 x 10 5 
vector molecules per \il A2\i\ aliquot from this solution yields 10 6 vectors 
1 0 containing tag-cDNA conjugates as inserts. This sample is then amplified by straight 
forward transformation of a competent host cell followed by culturing. 

Of course, as mentioned above, no step in the above process proceeds with 
perfect efficiency. In particular, when vectors are employed to amplify a sample of 
tag-polynucleotide conjugates, the step of transforming a host is very inefficient. 
15 Usually, no more than 1% of the vectors are taken up by the host and replicated. 
Thus, for such a method of amplification, even fewer dilutions would be required to 
obtain a sample of 1 0 6 conjugates. 

A repertoire of oligonucleotide tags can be conjugated to a population of 
- polynucleotides in a number of ways, including direct enzymatic ligation, 
20 amplification, e.g. via PCR, using primers containing the tag sequences, and the like. 
The initial ligating step produces a very large population of tag-polynucleotide 
conjugates such that a single tag is generally attached to many different 
polynucleotides. However, as noted above, by taking a sufficiently small sample of 
the conjugates, the probability of obtaining "doubles," i.e. the same tag on two 
25 different polynucleotides, can be made negligible. Generally, the larger the sample 
the greater the probability of obtaining a double. Thus, a design trade-off exists 
between selecting a large sample of tag-polynucleotide conjugates- which, for 
example, ensures adequate coverage of a target polynucleotide in a shotgun 
sequencing operation or adequate representation of a rapidly changing mRNA pool, 
and selecting a small sample which ensures that a minimal number of doubles will be 
present. In most embodiments, the presence of doubles merely adds an additional 
source of noise or, in the case of sequencing, a minor complication in scanning and 
signal processing, as microparticles giving multiple fluorescent signals can simply be 
ignored. 

As used herein, the term "substantially all" in reference to attaching tags to 
molecules, especially polynucleotides, is meant to reflect the statistical nature of the 
sampling procedure employed to obtain a population of tag-molecule conjugates 
essentially free of doubles. The meaning of substantially all in terms of actual 
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_ -pereentages.of tagdnolecule conjugates depends on how the tags are being employed. 

Preferably, for nucleic acid sequencing, substantially all means that at least eighty 

percent of the polynucleotides have unique tags attached. More preferably, it means 

that at least ninety percent of the polynucleotides have unique tags attached. Still 
5 more preferably, it means that at least ninety-five percent of the polynucleotides have 

unique tags attached. And, most preferably, it means that at least ninety-nine percent 

of the polynucleotides have unique tags attached. 

Preferably, when the population of polynucleotides consists of messenger 

RNA (mRNA), oligonucleotides tags may be attached by reverse transcribing the 
1 0 mRNA with a set of primers preferably containing complements of tag sequences. 

An exemplary set of such primers could have the following sequence (SEQ ID NO 

1): 

5 ' -mRNA- [A] n -3' 
15 [T] i 9 GG[W,W,W,C] 9ACCAGCTGATC-5 ' -biotin 



where "[W,W,W,C]o" represents the sequence of an oligonucleotide tag of nine 
- subunits of four nucleotides each and "[W,W,W,CJ" represents the subunit sequences 
20 listed above, i.e. " W" represents T or A. The underlined sequences identify an 

optional restriction endonuclease site that can be used to release the polynucleotide 
from attachment to a solid phase support via the biotin, if one is employed. For the 
above primer, the complement attached to a microparticle could have the form: 



25 



30 



5 ' - [G, W, W, W] gTGG-linker-microparticle 

After reverse transcription, the mRNA is removed, e.g. by RNase H digestion, 
and the second strand of the cDNA is synthesized using, for example, a primer of the 
following form (SEQ ID NO: 2): 

5 ' -NRRGATCYNNN-3 ' 

where N is any one of A, T, G, or C; R is a purine-containing nucleotide, and Y is a 
pyrimidine-containing nucleotide. This particular primer creates a Bst Yl restriction 
site in the resulting double stranded DNA which, together with the Sal I site, 
facilitates cloning into a vector with, for example, Bam HI and Xho I sites. After Bst 
Yl and Sal I digestion, the exemplary conjugate would have the form: 
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~"~ 5 1 -T*CGACCA[C,W,W,W] 9 GG[T]iV cDNA -NNNR 

GGT[G, W, W,W] 9 CC[A] 19 - rDNA -NNNYCTAG-5 1 

The polynucleotide-tag conjugates may then be manipulated using standard molecular 
5 biology techniques. For example, the above conjugate-which is actually a mixture- 
may be inserted into commercially available cloning vectors, e.g. Stratagene Cloning 
System (La Jolla, CA); transfected into a host, such as a commercially available host 
bacteria; which is then cultured to increase the number of conjugates. The cloning 
vectors may then be isolated using standard techniques, e.g. Sambrook et al, 

1 0 Molecular Cloning, Second Edition (Cold Spring Harbor Laboratory, New York, 
1989). Alternatively, appropriate adaptors and primers may be employed so that the 
conjugate population can be increased by PCR. 

Preferably, when the ligase-based method of sequencing is employed, the Bst 
Yl and Sal I digested fragments are cloned into a Bam HI-/Xho I-digested vector 

1 5 having the following single-copy restriction sites (SEQ ID NO: 3): 

5 1 -GAGGATCCCTTTATGGATCCACTCGAGATCCCAATCCA-3 ' 
Fokl BamHI Xhol 

20 

This adds the Fok I site which will allow initiation of the sequencing process 
discussed more fully below. 

Tags can be conjugated to cDNAs of existing libraries by standard cloning 
methods. cDNAs are excised from their existing vector, isolated, and then ligated into 

25 a vector containing a repertoire of tags. Preferably, the tag-containing vector is 

linearized by cleaving with two restriction enzymes so that the excised cDNAs can be 
ligated in a predetermined orientation. The concentration of the linearized tag- 
containing vector is in substantial excess over that of the cDNA inserts so that 
ligation provides an inherent sampling of tags. 

30 A general method for exposing the single stranded tag after amplification 

involves digesting a target polynucleotide-containing conjugate with the 5'-»3' 
exonuclease activity of T4 DNA polymerase, or a like enzyme. When used in the 
presence of a single deoxynucleoside triphosphate, such a polymerase will cleave 
nucleotides from 3' recessed ends present on the non-template strand of a double 

35 stranded fragment until a complement of the single deoxynucleoside triphosphate is 
reached on the template strand. When such a nucleotide is reached the 5'-»3' 
digestion effectively ceases, as the polymerase's extension activity adds nucleotides at 
a higher rate than the excision activity removes nucleotides. Consequently, single 
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... stranded tagsxonstructed with three nucleotides^ readily prepared for loading onto 
solid phase supports. . — 

The technique may also be used to preferentially methylate interior Fok I sites 
of a target polynucleotide while leaving a single Fok I site at the terminus of the 
> polynucleotide unmethylated. First, the terminal Fok I site is rendered single stranded 
using a polymerase with deoxycytidine triphosphate. The double stranded portion of 
the fragment is then methylated, after which the single stranded terminus is filled in 
with a DNA polymerase in the presence of all four nucleoside triphosphates, thereby 
regenerating the Fok I site. Clearly, this procedure can be generalized to 
endonucleases other than Fok I. 

After the oligonucleotide tags are prepared for specific hybridization, e g by 
rendering them single stranded as described above, the polynucleotides are mixed 
with microparticles containing the complementary sequences of the tags under 
conditions that favor the formation of perfectly matched duplexes between the tags 
and their complements. There is extensive guidance in the literature for creating these 
conditions. Exemplary references providing such guidance include Wetmur, Critical 
Reviews in Biochemistry and Molecular Biology, 26: 227-259 (1 991); Sambrook et 
al, Molecular Cloning: A Laboratory Manual, 2nd Edition (Cold Spring Harbor 
- Laboratory, New York, 1989); and the like. Preferably, the hybridization conditions 
are sufficiently stringent so that only perfectly matched sequences form stable 
duplexes. Under such conditions the polynucleotides specifically hybridized through 
their tags may be ligated to the complementary sequences attached to the 
microparticles. Finally, the microparticles are washed to remove polynucleotides with 
unligated and/or mismatched tags. 

When CPG microparticles conventionally employed as synthesis supports are 
used, the density of tag complements on the microparticle surface is typically greater 
than that necessary for some sequencing operations. That is, in sequencing 
approaches that require successive treatment of the attached polynucleotides with a 
variety of enzymes, densely spaced polynucleotides may tend to inhibit access of the 
relatively bulky enzymes to the polynucleotides. In such cases, the polynucleotides 
are preferably mixed with the microparticles so that tag complements are present in 
significant excess, e.g. from 10:1 to 100:1, or greater, over the polynucleotides This 
ensures that the density of polynucleotides on the microparticle surface will not be so 
high as to inhibit enzyme access. Preferably, the average inter-polynucleotide spacing 
on the m.croparticle surface is on the order of 30-100 nm. Guidance in selecting 
ratios for standard CPG supports and Ballotini beads (a type of solid glass support) is 
found in Maskos and Southern, Nucleic Acids Research, 20: 1679-1684 (1992). 
Preferably, for sequencing applications, standard CPG beads of diameter in the range 
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' " " "of 20-50 are roaded with about 10 5 polynucleotides, and GMA beads of diameter 
in the range of 5-10 jim are loaded with a few tens of thousand of polynucleotides, 
e.g.4x 10 4 to6x 10 4 . 

In the preferred embodiment, tag complements are synthesized on 
5 microparticles combinatorial^; thus, at the end of the synthesis, one obtains a 

complex mixture of microparticles from which a sample is taken for loading tagged 
polynucleotides. The size of the sample of microparticles will depend on several 
factors, including the size of the repertoire of tag complements, the nature of the 
apparatus for used for observing loaded microparticles-e.g. its capacity, the tolerance 
10 for multiple copies of microparticles with the same tag complement (i.e. "bead 
doubles"), and the like. The following table provide guidance regarding 
microparticle sample size, microparticle diameter, and the approximate physical 
dimensions of a packed array of microparticles of various diameters. 

15 

Microparticle diameter 5 um 10 *un 20 um 40 um 

Max. no. 

polynucleotides loaded 

at]per!0 5 sq. 3xl0 5 1.26 x 10 6 5 x I0 6 

angstrom 

Approx. area of 
monolayer of 10 6 

microparticles .45 x .45 cm 1 x I cm 2 x 2 cm 4 x 4 cm 

20 The probability that the sample of microparticles contains a given tag complement or 
is present in multiple copies is described by the Poisson distribution, as indicated in 
the following table. 

25 

Table VII 
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Number of 
microparticles in 
sample (as fraction 
of repertoire size), 
m 



1. 000 
.693 
.405 
.285 
.223 
.105 
.010 



Fraction of 
repertoire of tag 
complements 
present in 
sample, 
l-e-™ 



0.63 
0.50 
0.33 
0.25 
0.20 
0.10 
0.01 



Fraction of 
microparticles in 
sample with unique 
tag complement 

attached, 

m(e' m )/2 



0.37 
0.35 
0.27 
0.21 
0.18 
0.09 
0.01 



Fraction of 
microparticles in 
sample carrying 
same tag 
complement as one 
other microparticle 

in sample 
("bead doubles"), 
m 2 (e' m )/2 



0.18 
0.12 
0.05 
0.03 
0.02 
0.005 



High Specificity S orting and Panninp 
The kinetics of sorting depends on the rate of hybridization of oligonucleotide 
tags to their tag complements which, in turn, depends on the complexity of the tags in 
- the hybridization reaction. Thus, a trade off exists between sorting rate and tag 
complexity, such that an increase in sorting rate may be achieved at the cost of 
reducing the complexity of the tags involved in the hybridization reaction. As 
explained below, the effects of this trade off may be ameliorated by "panning." 

Specificity of the hybridizations may be increased by taking a sufficiently 
small sample so that both a high percentage of tags in the sample are unique and the 
nearest neighbors of substantially all the tags in a sample differ by at least two words 
This latter condition may be met by taking a sample that contains a number of tag- 
polynucleotide conjugates that is about 0.1 percent or less of the size of the repertoire 
being employed. For example, if tags are constructed with eight words selected from 
Table II, a repertoire of 8« or about 1 .67 x 1 0?, tags and tag complements are 
produced. In a library of tag-cDNA conjugates as described above, a 0. 1 percent 
sample means that about 16,700 different tags are present. If this were loaded directly 
onto a repertoire-equivalent of microparticles, or in this example a sample of 1 .67 x ' 
107 microparticles, then only a sparse subset of the sampled microparticles would be 
loaded. The density of loaded microparticles can be increase-for example, for more 
efficient sequencing~by undertaking a "panning" step in which the sampled tag- 
cDNA conjugates are used to separate loaded microparticles from unloaded 
microparticles. Thus, in the example above, even though a "0. 1 percent" sample 
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' "contains only 16,700 cDNAs, the sampling and panning steps may be repeated until 
as many loaded microparticles as desired are accumulated. 

A panning step may be implemented by providing a sample of tag-cDNA 
conjugates each of which contains a capture moiety at an end opposite, or distal to, 
the oligonucleotide tag. Preferably, the capture moiety is of a type which can be 
released from the tag-cDNA conjugates, so that the tag-cDNA conjugates can be 
sequenced with a single-base sequencing method. Such moieties may comprise 
biotin, digoxigenin, or like ligands, a triplex binding region, or the like. Preferably, 
such a capture moiety comprises a biotin component. Biotin may be attached to tag- 
cDNA conjugates by a number of standard techniques. If appropriate adapters 
containing PCR primer binding sites are attached to tag-cDNA conjugates, biotin may 
be attached by using a biotinylated primer in an amplification after sampling. 
Alternatively, if the tag-cDNA conjugates are inserts of cloning vectors, biotin may be 
attached after excising the tag-cDNA conjugates by digestion with an appropriate 
restriction enzyme followed by isolation and filling in a protruding strand distal to the 
tags with a DNA polymerase in the presence of biotinylated uridine triphosphate. 

After a tag-cDNA conjugate is captured, it may be released from the biotin 
moiet>- in a number of ways, such as by a chemical linkage that is cleaved by 
-reduction, e.g. Herman et al, Anal. Biochem., 156: 48-55 (1986), or that is cleaved 
photochemically, e.g. Olejnik et al, Nucleic Acids Research, 24: 361-366 (1996), or 
that is cleaved enzymatically by introducing a restriction site in the PCR primer. The 
latter embodiment can be exemplified by considering the library of tag-polynucleotide 
conjugates described above: 

5'-RCGACCA[C,W,W,W] 9 GG[T] 19 - cDNA -NNNR 

GGT [G, W, W, W] 9CC [A] 1 9- rDNA -NNNYCTAG-5 ' 

The following adapters may be ligated to the ends of these fragments to permit 
amplification by PCR: 



5 • - xxxxxxxxxxxxxxxxxxxx 

XXXXXXXXXXXXXXXXXXXXYGAT 
Right Adapter 



GATCZZACTAGTZZZZZZZZZZZZ-3' 
ZZTGATCAZZZZZZZZZZZZ 
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Left Adapter 
ZZTGATCAZZZZZZZZZZZZ-5'-biotin 
Left Primer 

where "ACTAGT" is a Spe I recognition site (which leaves a staggered cleavage 
ready for single base sequencing), and the X's and Z's are nucleotides selected so that 
the annealing and dissociation temperatures of the respective primers are 
approximately the same. After ligation of the adapters and amplification by PCR 
using the biotinylated primer, the tags of the conjugates are rendered single stranded 
by the exonuclease activity of T4 DNA polymerase and conjugates are combined with 
a sample of microparticles, e.g. a repertoire equivalent, with tag complements 
attached. After annealing under stringent conditions (to minimize mis-attachment of 
tags), the conjugates are preferably ligated to their tag complements and the loaded 
microparticles are separated from the unloaded microparticles by capture with 
avidinated magnetic beads, or like capture technique. 

Returning to the example, this process results in the accumulation of about 
10.500 (=16,700 x .63) loaded microparticles with different tags, which may be 
released from the magnetic beads by cleavage with Spe I. By repeating this process 
40-50 times with new samples of microparticles and tag-cDNA conjugates, 4-5 x 105 
cDNAs can be accumulated by pooling the released microparticles. The pooled 
microparticles may then be simultaneously sequenced by a single-base sequencing 
technique. 

Determining how many times to repeat the sampling and panning steps-or 
more generally, determining how many cDNAs to analyze, depends on one's 
objective. If the objective is to monitor the changes in abundance of relatively 
common sequences, e.g. making up 5% or more of a population, then relatively small 
samples, i.e. a small fraction of the total population size, may allow statistically 
significant estimates of relative abundances. On the other hand, if one seeks to 
monitor the abundances of rare sequences, e.g. making up 0. 1 % or less of a 
population, then large samples are required. Generally, there is a direct relationship 
between sample size and the reliability of the estimates of relative abundances based 
on the sample. There is extensive guidance in the literature on determining 
appropriate sample sizes for making reliable statistical estimates, e.g. Koller et al. 
Nucleic Acids Research, 23:185-191 (1994); Good, Biometrika, 40: 16-264 (1953); 
Bunge et al, J. Am. Stat. Assoc., 88: 364-373 (1993); and the like. Preferably, for 
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monitoring changes in gene expression based onfoe analysis ofa.series of cDNA 
libraries containing lO^to 10 8 independent clones of 3.0-3.5 x 10 4 different 
sequences, a sample of at least 1CH sequences are accumulated for analysis of each 
library. More preferably, a sample of at least 105 sequences are accumulated for the 
5 analysis of each library; and most preferably, a sample of at least 5 x 1 0* sequences 
are accumulated for the analysis of each library. Alternatively, the number of 
sequences sampled is preferably sufficient to estimate the relative abundance of a 
sequence present at a frequency within the range of 0.1% to 5% with a 95% 
confidence limit no larger than 0. 1% of the population size. 

Single Base DNA Seg uencinp 
The present invention can be employed with conventional methods of DNA 
sequencing, e.g. as disclosed by Hultman et al, Nucleic Acids Research, 1 7: 4937- 
4946 (1 989). However, for parallel, or simultaneous, sequencing of multiple 
polynucleotides, a DNA sequencing methodology is preferred that requires neither 
electrophoretic separation of closely sized DNA fragments nor analysis of cleaved 
nucleotides by a separate analytical procedure, as in peptide sequencing. Preferably, 
the methodology permits the stepwise identification of nucleotides, usually one at a 
- time, in a sequence through successive cycles of treatment and detection. Such 
methodologies are referred to herein as "single base" sequencing methods. Single 
base approaches are disclosed in the following references: Cheeseman, U.S. patent 
5,302,509; Tsien et al, International application WO 91/06678; Rosenthal et al, 
International application WO 93/21340; Canard et al, Gene, 148: 1-6 (1994); and 
Metzker et al, Nucleic Acids Research, 22: 4259-4267 ( 1 994). 

A "single base" method of DNA sequencing which is suitable for use with the 
present invention and which requires no electrophoretic separation of DNA fragments 
is described in International application PCT/US95/03678. Briefly, the method 
comprises the following steps: (a) ligating a probe to an end of the polynucleotide 
having a protruding strand to form a ligated complex, the probe having a 
complementary protruding strand to that of the polynucleotide and the probe having a 
nuclease recognition site; (b) removing unligated probe from the ligated complex; (c) 
identifying one or more nucleotides in the protruding strand of the polynucleotide by 
the identity of the ligated probe; (d) cleaving the ligated complex with a nuclease; and 
(e) repeating steps (a) through (d) until the nucleotide sequence of the polynucleotide, 
or a portion thereof, is determined. 

A single signal generating moiety, such as a single fluorescent dye, may be 
employed when sequencing several different target polynucleotides attached to 
different spatially addressable solid phase supports, such as fixed microparticles. in a 
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pafaUel sequencing operation. This may be accomplished by providing four sets of 

probes that are applied sequentially to the plurality of target polynucleotides on the 
different microparticles. An exemplary set of such probes are shown below: 

5 
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where each of the listed probes represents a mixture of 43=64 oligonucleotides such 
that the identity of the 3' terminal nucleotide of the top strand is fixed and the other 
positions in the protruding strand are filled by every 3-mer permutation of nucleotides, 
. or complexity reducing analogs. The listed probes are also shown with a single 
stranded poly-T tail with a signal generating moiety attached to the terminal thymidine 
shown as «T~. The "d" on the unlabeled probes designates a ligation-blocking moiety 
or absense of 3'-hydroxyl, which prevents unlabeled probes from being ligated. 
Preferably, such 3'-terminal nucleotides are dideoxynucleotides. In this embodiment 
the probes of set lare first applied to the plurality of target polynucleotides and treated 
with a ligase so that target polynucleotides having a thymidine complementary to the 3' 
terminal adenosine of the labeled probes are ligated. The unlabeled probes are 
simultaneously applied to minimize inappropriate l.gations. The locations of the target 
polynucleotides that form ligated complexes with probes terminating in "A" are 
identified by the signal generated by the label carried on the probe. After washing and 
cleavage, the probes of set 2 are applied. In this case, target polynucleotides forming 
hgated complexes with probes terminating in "C" are identified by location. Similarly 
the probes of sets 3 and 4 are applied and locations of positive signals identified This 
process of sequentially applying the four sets of probes continues until the desired 
number of nucleotides are identified on the target polynucleotides. Clearly, one of 
ordinary skill could construct similar sets of probes that could have many variations 
such as having protruding strands of different lengths, different moieties to block 
ligation of unlabeled probes, different means for labeling probes, and the like. 
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Apparatus for Sequencing Populations of Polynucleotides 
An objective of.the invention is to sort identical molecules, particularly 
polynucleotides, onto the surfaces of microparticles by the specific hybridization of 
tags and their complements. Once such sorting has taken place, the presence of the 
molecules or operations performed on them can be detected in a number of ways 
depending on the nature of the tagged molecule, whether microparticles are detected 
separately or in "batches," whether repeated measurements are desired, and the like. 
Typically, the sorted molecules are exposed to ligands for binding, e.g. in drug 
development, or are subjected chemical or enzymatic processes, e.g. in polynucleotide 
sequencing. In both of these uses it is often desirable to simultaneously observe 
signals corresponding to such events or processes on large numbers of microparticles. 
Microparticles carrying sorted molecules (referred to herein as "loaded" 
microparticles) lend themselves to such large scale parallel operations, e.g. as 
demonstrated by Lam et al (cited above). 

Preferably, whenever light-generating signals, e.g. chemiluminescent, 
fluorescent, or the like, are employed to detect events or processes, loaded 
microparticles are spread on a planar substrate, e.g. a glass slide, for examination with 
a scanning system, such as described in International patent applications 
- PCT/US91/09217,PCT/NL90/00081,andPCT/US95/01886. The scanning system 
20 should be able to reproducibly scan the substrate and to define the positions of each 
microparticle in a predetermined region by way of a coordinate system. In 
polynucleotide sequencing applications, it is important that the positional 
identification of microparticles be repeatable in successive scan steps. 

Such scanning systems may be constructed from commercially available 
components, e.g. x-y translation table controlled by a digital computer used with a 
detection system comprising one or more photomultiplier tubes, or alternatively, a 
CCD array, and appropriate optics, e.g. for exciting, collecting, and sorting 
fluorescent signals. In some embodiments a confocal optical system may be 
desirable. An exemplary scanning system suitable for use in four-color sequencing is 
30 illustrated diagrammatically in Figure 5. Substrate 300, e.g. a microscope slide with 
fixed microparticles, is placed on x-y translation table 302, which is connected to and 
controlled by an appropriately programmed digital computer 304 which may be any of 
a variety of commercially available personal computers, e.g. 486-based machines or 
PowerPC model 7100 or 8100 available form Apple Computer (Cupertino, CA). 
35 Computer software for table translation and data collection functions can be provided 
by commercially available laboratory software, such as Lab Windows, available from 
National Instruments. 
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" " ' Substrate-300 and table 302 are operationally associated with microscope 306 
having one or more objective lenses 308 which are capable of Electing and 
delivering light to microparticles fixed to substrate 300. Excitation beam 3 1 0 from 
light source 312, which is preferably a laser, is directed to beam splitter 314 eg a 
> dichroic mirror, which re-directs the beam through microscope 306 and objective lens 
308 which, in turn, focuses the beam onto substrate 300. Lens 308 collects 
fluorescence 316 emitted from the microparticles and directs it through beam splitter 
314 to signal distribution optics 318 which, in turn, directs fluorescence to one or 
more suitable opto-electronic devices for converting some fluorescence characteristic 
e.g. mtens,ty, lifetime, or the like, to an electrical signal. Signal distribution optics 
318 may comprise a variety of components standard in the art, such as bandpass 
filters, fiber optics, rotating mirrors, fixed position mirrors and lenses, diffraction 
gratings, and the like. As illustrated in Figure 2, signal distribution optics 318 directs 
fluorescence 3 16 to four separate photomultiplier tubes, 330, 332, 334, and 336 

whose output is then directed to pre-amps and photon counters 350, 352 354 and 
356. The output of the photon counters is collected by computer 304, where it can be 
stored, analyzed, and viewed on video 360. Alternatively, signal distribution optics 
318 could be a diffraction grating which directs fluorescent signal 318 onto a CCD 
- array. 

The stability and reproducibility of the positional localization in scanning will 
determine, to a large extent, the resolution for separating closely spaced 
microparticles. Preferably, the scanning systems should be capable of resolving 

closely spaced microparticles, e.g. separated by a particle diameter or less Thus for 
most apphcations, e.g. using CPG microparticles, the scanning system should at least 
have the capability of resolving objects on the order of 1 0-1 00 urn. Even higher 
resolution may be desirable in some embodiments, but with increase resolution the 
time required to fully scan a substrate will increase; thus, in some embodiments a 
comprom.se may have to be made between speed and resolution. Increases in 
scannmg time can be achieved by a system which only scans positions where 
microparticles are known to be located, e.g from an initial full scan. Preferably 
microparticle size and scanning system resolution are selected to permit resolution of 
fluorescently labeled microparticles randomly disposed on a plane at a density 
between about ten thousand to one hundred thousand microparticles per cm*. 

In sequencing applications, loaded microparticles can be fixed to the surface 
of a substrate in variety of ways. The fixation should be strong enough to allow the 
microparticles to undergo successive cycles of reagent exposure and washing without 
significant loss. When the substrate is glass, its surface may be derivatized with an 
alkylam.no linker using commercially available reagents, e.g. Pierce Chemical, which 
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in turn may be cross-linked to avidin, again using conventional chemistries, to form 
an avidinated surface. Biotin moieties can be introduced to the loaded microparticles 
in a number of ways. For example, a fraction, e.g. 10-15 percent, of the cloning 
vectors used to attach tags to polynucleotides are engineered to contain a unique 
5 restriction site (providing sticky ends on digestion) immediately adjacent to the 
polynucleotide insert at an end of the polynucleotide opposite of the tag. The site is 
excised with the polynucleotide and tag for loading onto microparticles. After 
loading, about 10-15 percent of the loaded polynucleotides will possess the unique 
restriction site distal from the microparticle surface. After digestion with the 
0 associated restriction endonuclease, an appropriate double stranded adaptor 

containing a biotin moiety is ligated to the sticky end. The resulting microparticles 
are then spread on the avidinated glass surface where they become fixed via the 
biotin-avidin linkages. 

Alternatively and preferably when sequencing by ligation is employed, in the 
initial ligation step a mixture of probes is applied to the loaded microparticle: a 
fraction of the probes contain a type lis restriction recognition site, as required by the 
sequencing method, and a fraction of the probes have no such recognition site, but 
instead contain a biotin moiety at its non-ligating end. Preferably, the mixture 
-comprises about 10-15 percent of the biotinylated probe. 

In still another alternative, when DNA-loaded microparticles are applied to a 
glass substrate, the DNA may nonspecifically adsorb to the glass surface upon several 
hours, e.g. 24 hours, incubation to create a bond sufficiently strong to permit repeated 
exposures to reagents and washes without significant loss of microparticles. 
Preferably, such a glass substrate is a flow cell, which may comprise a channel etched 
in a glass slide. Preferably, such a channel is closed so that fluids may be pumped 
through it and has a depth sufficiently close to the diameter of the microparticles so 
that a monolayer of microparticles is trapped within a defined observation region. 

Identification of Novel Polynucleotides 
in cDNA Libraries 

Novel polynucleotides in a cDNA library can be identified by constructing a 
library of cDNA molecules attached to microparticles, as described above. A large 
fraction of the library, or even the entire library, can then be partially sequenced in 
parallel. After isolation of mRNA, and perhaps normalization of the population as 
taught by Soares et al, Proc. Natl. Acad. Sci., 91 : 9228-9232 (1994), or like 
references, the following primer may by hybridized to the polyA tails for first strand 
synthesis with a reverse transcriptase using conventional protocols (SEQ ID NO: 1 ): 
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5 ' -mRNA- [A] n -3' 

"WlS- IPriner site] -GG [W, M, W, C] 9 ACCASCTGATC-5 ' 

where [W.W,W.C] 9 represents a tag as described above, "ACCAGCTGATC" is an 
optional sequence fonning . restriction site in double stranded form, and "primer site- 
is a sequence common to all members of the library that is later used as a primer 
binding site for amplifying polynucleotides of interest by PCR 

After reverse transcription and second strand synthesis by conventional 
techniques the double stranded fragments are inserted into a cloning vector as 
d^scnbe above and amphfied. The amplified library is then sampled and the sample 

cDNA I 11108 r t0rS fTOm SamplC « -* ^ 

DNA fragments excised and purified. After rendering the tag single stranded with a 

polymerase as descnbed above, the fragments are methylated and sorted onto 
microparticles in accordance with the invention. Preferably, as described above the 
cloning vector ,s constructed so that the tagged cDNAs can be excised with an ' 
endonuclease, such as Fok I, that will allow immediate sequencing by the preferred 
single base method after sorting and ligation to microparticles. 

Stepwise sequencing is then carried out simultaneously on the whole library 
or one or more large fractions of the library, in accordance with the invention unTa 
sufficient number of nucleotides are identified on each cDNA for unique 
representation in the genome of the organism from which the library is derived For 
example, if the library is derived from mammalian mRNA then a randomly selected 
equence 1^,5 nucleotides long is expected to have unique representation among the 

of far IT 7 r ° f ^ ^ " 8Cn0me - identification 
of far ewer nucleotides would be sufficient for unique representation in a library 

derived from bacteria, or other lower organisms. Preferably, at least 20-30 
nucleotides are identified * ensure unique representation and to permit construction 
of a suitable primer as described below. The tabulated sequences may then be 
compared to known sequences to identify unique cDNAs 

Unique cDNAs are then isolated by conventional techniques, e.g. constructing 

r:: ■ , : PC * ™ p,icon produced with primers *«* * - ^ * - 

sedTT 1ST Wh0SC SCqUenCe ^ dCtermined ^ ^ *** be 
used to identify the cDNA in a library using a conventional screening protoco. 

mRNA a , meth ° d idemifyin8 ^ CDNAS may 3,50 be used ^ fingerprint 
mRNA populations, either in isolated measurements or in the context of a 

dynamically changing population. Partial sequence information is obtained 

simultaneously from a large sample, e.g. ten to a hundred thousand, or more, of 

cDNAs attached to separate microparticles as described in the above method 
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Example 1 

Construction of a Tag Library 
An exemplary tag library is constructed as follows to form the chemically 
synthesized 9-word tags of nucleotides A, G, and T defined by the formula: 

3'-TGGC-[4(A,GJ)9]-CCCCp 

where "[ 4 (A,G,T)9] M indicates a tag mixture where each tag consists of nine 4-mer 
words of A, G, and T; and "p" indicate a 5' phosphate. This mixture is ligated to the 
following right and left primer binding regions (SEQ ID NO: 4 and SEQ ID NO 5): 

5'- AGTGGCTGGGCATCGGACCG 5'- GGGGCCCAGTCAGCGTCGAT 

TCACCGACCCGTAGCCp GGGTCAGTCGCAGCTA 

LEFT RIGHT 

The right and left primer binding regions are ligated to the above tag mixture, after 
which the single stranded portion of the ligated structure is filled with DNA 
polymerase then mixed with the right and left primers indicated below and amplified 
to give a tag library (SEQ ID NO: 6). 

Left Prim or 

5 ' - AGTGGCTGGGCATCGGACCG 

5 ' - AGTGGCTGGGCATCGGACCG- [ 4 { A, G, T ) 9 ] -GGGGCCCAGTCAGCGTCGAT 
TCACCGACCCGTAGCCTGGC- [ 4 (A, G, T ) 9] -C CCCGGG TCAGT CGCAG CTA 

CCCCGGGTCAGTCGCAGCTA- 5 ' 

Right Primer 

The underlined portion of the left primer binding region indicates a Rsr II recognition 
site. The left-most underlined region of the right primer binding region indicates 
recognition sites for Bsp 1201, Apa I and Eco 0 1091, and a cleavage site for Hga I. 
The right-most underlined region of the right primer binding region indicates the 
recognition site for Hga I. Optionally, the right or left primers may be synthesized 
with a biotin attached (using conventional reagents, e.g. available from Clontech 
Laboratories, Palo Alto, CA) to facilitate purification after amplification and/or 
cleavage. 
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(plasmid) -5' -AAAAGGAGGAGGCCTTGATAGAGAGGACCT- 



5 



10 



-CAAATTTG 



CC TAGG - AGAAGG AG AAGG AG AAG G - 



t 



T 



Bam HI site 



Pme I site 



15 



The plasmid is cleaved with Ppu MI and Pme I (to give a Rsr Il-compatible end and a 
flush end so that the insert is oriented) and then methylated with DAM methylase. 
The tag-containing construct is cleaved with Rsr II and then ligated to the open 
plasmid, after which the conjugate is cleaved with Mbo I and Bam HI to permit 
20 ligation and closing of the plasmid. The plasmid is then amplified and isolated and 
- used in accordance with the invention. 



In this experiment, to test the capability of the method of the invention to 
detect genes induced as a result of exposure to xenobiotic compounds, the gene 
expression profile of rat liver tissue is examined following administration of several 
compounds known to induce the expression of cytochrome P-450 isoenzymes. The 
30 results obtained from the method of the invention are compared to results obtained 
from reverse transcriptase PCR measurements and immunochemical measurements of 
the cytochrome P-450 isoenzymes. Protocols and materials for the latter assays are 
described in Morris et al, Biochemical Pharmacology, 52: 781-792 (1996). 



35 200-300 g are used, and food and water are available to the animals ad lib. Test 
compounds are phenobarbital (PB), metyrapone (MET), dexamethasone (DEX), 
clofibrate (CLO), corn oil (CO), and P-naphthoflavone (BNF), and are available from 
Sigma Chemical Co. (St. Louis, MO). Antibodies against specific P-450 enzymes are 
available from the following sources: rabbit anti-rat CYP3A1 from Human Biologies, 

40 Inc. (Phoenix, AZ); goat anti-rat CYP4A1 from Daiichi Pure Chemicals Co. (Tokyo, 
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Example 3 

Changes in Gene Expression Profiles in Liver Tissue of Rats 
Exposed to Various Xenobiotic Agents 



Male Sprague-Dawley rats between the ages of 6 and 8 weeks and weighing 
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- ~ - -fcP 30 )' monoclonal mouse anti-rat CYP 1 A 1 , monoclonal mouse anti-rat CYP2CI 1, 
goat anti-rat C YP2E 1 , and monoclonal mouse anti-rat CYP2B ffrom Oxford 
Biochemical Research, Inc. (Oxford, MI). Secondary antibodies (goat anti-rabbit IgG. 
rabbit anti-goat IgG and goat anti-mouse IgG) are available from Jackson 
5 ImmunoResearch Laboratories (West Grove, PA). 

Animals are administered either PB (1 00 mg/kg), BNF (1 00 mg/kg), MET 
(100 mg/kg), DEX (100 mg/kg), or CLO (250 mg/kg) for 4 consecutive days via 
intraperitoneal injection following a dosing regimen similar to that described by 
Wang et al, Arch. Biochem. Biophys. 290: 355-361 (1991). Animals treated with 

1 0 H 2 0 and CO are used as controls. Two hours following the last injection (day 4), 
animals are killed, and the livers are removed. Livers are immediately frozen and 
stored at -70°C. 

Total RNA is prepared from frozen liver tissue using a modification of the 
method described by Xie et al, Biotechniques, 11:326-327(1991). Approximately 
1 5 1 00-200 mg of liver tissue is homogenized in the RNA extraction buffer described by 
Xie et al to isolate total RNA. The resulting RNA is reconstituted in 
diethylpyrocarbonate-treated water, quantified spectrophotometrically at 260 nm, and 
adjusted to a concentration of 1 00 ug/ml. Total RNA is stored in 
- diethylpyrocarbonate-treated water for up to 1 year at -70°C without any apparent 
degradation. RT-PCR and sequencing are performed on samples from these 
preparations. 

For sequencing, samples of RNA corresponding to about 0.5 ug of poly(A) + 
RNA are used to construct libraries of tag-cDNA conjugates following the protocol 
described in the section entitled "Attaching Tags to Polynucleotides for Sorting onto 
Solid Phase Supports," with the following exception: the tag repertoire is constructed 
from six 4-nucleotide words from Table II. Thus, the complexity of the repertoire is 
86 or about 2.6 x 105. For each tag. cDNA conjugate UbnBy constmcted ^ samp)es 

of about ten thousand clones are taken for amplification and sorting. Each of the 
amplified samples is separately applied to a fixed monolayer of about 10* 10 urn 
diameter GMA beads containing tag complements. That is, the "sample" of tag 
complements in the GMA bead population on each monolayer is about four fold the 
total size of the repertoire, thus ensuring there is a high probability that each of the 
sampled tag-cDNA conjugates will find its tag complement on the monolayer. After 
the oligonucleotide tags of the amplified samples are rendered single stranded as 
described above, the tag-cDNA conjugates of the samples are separately applied to the 
monolayers under conditions that permit specific hybridization only between 
oligonucleotide tags and tag complements forming perfectly matched duplexes. 
Concentrations of the amplified samples and hybridization times are selected to 



20 



30 



-42- 



WO 97/13877 



PCTVUS96/16342 



10 



" permit the 16aa*irfg' of about 5 x 1 0 4 to" 2 x 1 0 5 tag-cDNA conjugates on each bead 
where perfect matches occur. After ligation, 9-12 nucleotide portions of the attached 
cDNAs are determined in parallel by the single base sequencing technique described 
by Brenner in International patent application PCT/US95/03678. Frequency 
distributions for the gene expression profiles are assembled from the sequence 
information obtained from each of the ten samples. 

RT-PCRs of selected mRNAs corresponding to cytochrome P-450 genes and 
the constitutively expressed cyclophilin gene are carried out as described in Morris et 
al (cited above). Briefly, a 20 uL reaction mixture is prepared containing lx reverse 
transcriptase buffer (Gibco BRL), 10 nM dithiothreitol, 0.5 nM dNTPs, 2.5 uM oligo 
d(T), 5 primer, 40 units RNasin (Promega, Madison, WI), 200 units RNase H-reverse 
transcriptase (Gibco BRL), and 400 ng of total RNA (in diethylpyrocarbonate-treated 
water). The reaction is incubated for 1 hour at 37°C followed by inactivation of the 
enzyme at 95°C for 5 min. The resulting cDNA is stored at -20°C until used. For 
1 5 PCR amplification of cDNA, a 1 0 uL reaction mixture is prepared containing 1 Ox 
polymerase reaction buffer, 2 mM MgCl 2 , 1 unit Taq DNA polymerase (Perkin- 
Elmer, Norwalk. CT), 20 ng cDNA, and 200 nM concentration of the 5' and 3' 
specific PCR primers of the sequences described in Morris et al (cited above). PCRs 
-are carried out in a Perkin-Elmer 9600 thermal cycler for 23 cycles using melting, 
20 annealing, and extension conditions of 94°C for 30 sec, 56°C for 1 min., and 72°C 
for 1 min.. respectively. Amplified cDNA products are separated by PAGE using 5% 
native gels. Bands are detected by staining with ethidium bromide. 

Western blots of the liver proteins are carried out using standard protocols 
after separation by SDS-PAGE. Briefly, proteins are separated on 10% SDS-PAGE 
gels under reducing conditions and immunoblotted for detection of P-450 isoenzymes 
using a modification of the methods described in Harris et al, Proc. Natl. Acad. Scu 
88: 1407-1410 (1991). Protein are loaded at 50 ug/lane and resolved under constant 
current (250 V) for approximately 4 hours at 2°C Proteins are transferred to 
nitrocellulose membranes (Bio-Rad, Hercules, CA) in 15 mM Tris buffer containing 
30 120 mM glycine and 20% (v/v) methanol. The nitrocellulose membranes are blocked 
with 2.5% BSA and immunoblotted for P-450 isoenzymes using primary monoclonal 
and polyclonal antibodies and secondary alkaline phosphatase conjugated anti-IgG. 
Immunoblots are developed with the Bio-Rad alkaline phosphatase substrate kit. 
The three types of measurements of P-450 isoenzyme induction showed 
35 substantial agreement. 
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APPENDIX la ~~ 

Exemplary compu ter prog ram for generating 
minimally cross hybridizing sets 
(single stranded tag/single stranded tag complement) 



Program minxh 
c 



c 
c 



c 
c 



c 
c 

c 
c 
c 



c 
c 



900 

c 

c 



integers subl (6) ,msetl ( 1000, 6) , mset2 ( 1000, 6 > 
dimension nbase(6) 



write ( * , * ) • ENTER SUBUNIT LENGTH * 
read(*, 100)nsub 
100 format (il) 

open {1, file- ' sub4 . dat ' , f orm= ' formatted ' , status= ' new ' ) 

r 



nset=0 

do 7000 ml = l, 3 
do 7000 m2=l, 3 
do 7000 m3=l, 3 
do 7000 m4 = l, 3 
subl (l)=ml 
subl (2) =m2 
subl ;3) =m3 
subl (4 )=*m4 



ndif f=3 



Generate set of subunits differing from 
subl by at least ndiff nucleotides. 
c Save in mset 1 . 



do 900 j=l,nsub 

msetld, j)=subl(j) 



do 1000 kl = l, 3 
do 1000 k2=l,3 
do 1000 k3=l, 3 
do 1000 k4 = l, 3 



nbase(l)=kl 
nbase (2) =k2 
nbase (3) =k3 
nbase ( 4 ) = k4 
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1200 



n=0 

do 1200 j=l,nsub 
if (subl(j) .eq.l 
subl ( j ) . eq. 2 
subf ( j ) .eq. 3 
n=n+l 
endif 
continue 



. and. nbase ( j ) . ne 
. and. nbase { j ) .ne 
.and. nbase (j 



1 .or. 

2 .or. 
ne.3) then 



if (n.ge.ndiff } then 



c 
c 
c 
c 
c 
c 
c 



1100 

c 
c 

1000 

c 

c 



1325 



continue 



do 1325 j2=l,nsub 
mset2 (1, j2} =msetl (1, j2) 
mset2{2, j2)=msetl {2, j2) 



If number of mismatches 
is greater than or equal 
to ndiff then record 
subunit in matrix mset 



j j=j j+1 
do 1100 i*=l,nsub 

msetl ( j j , i ) =nbase (i ) 
endif 



c 
c 
c 

c 
c 
c 
c 
c 
c 
c 
c 
c 
c 
c 
c 

c 
c 

1700 



npass=0 



continue 

kk«npass+2 

npass=npass+l 



Compare subunit 2 from 
msetl with each successive 
subunit in msetl, i.e. 3, 
4,5, ... etc. Save those 
with mismatches . ge . ndiff 
in matrix mset2 starting at 
position 2. 

Next transfer contents 
of mset2 into msetl and 
start 

comparisons again this time 
starting with subunit 3. 
Continue until all subunits 
undergo the comparisons. 
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c 



1600 

1625 

1500 

c 
c 
c 
c 
c 
c 
c 

2000 



c 

7009 

7008 
7010 

120 
7000 

c 

c 

c 



dcf 1500 m=npass+2, j j 

n=0 „. — 
do 1600 j=l,nsub 

if Onset 1 (npass+1, j) . eq. 1 . and . mset 1 (m, j) .ne.l.or. 
msetl <npass+l,j) .eq.2 . and. mset 1 (m, j ) . ne . 2 . or . 
msetl (npass+1, j) . eq. 3. and. mset 1 <m, j ) .ne.3) then 
n=n+ 1 
endif 
continue 
if (n.ge.ndif f ) then 
kk=kk+l 

do 1625 i=l,nsub 

mset2(kk,i)=msetl 

endif 
continue 

kk is the number of subunits 
stored in mset2 

Transfer contents of mset2 
into msetl for next pass. 



do 2000 k=l, kk 

do 2000 m«l, nsub 

msetl {k,m)=mset2 (k,m) 
if (kk.lt .jj ) then 
jj=kk 
goto 1700 
endif 



nset=nset+l 
writed, 7009) 

format ( / ) 
do 7008 k=l, kk 

write (1,7010) (msetl (k, m) ,m=l, nsub) 
format (4il) 
write { *, * ) 

write{*, 120) kk,nset 

format <lx, 'Subunits in set= • , i5, 2x, ' Set No=\i5) 
continue 
close (1) 



end 
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APPENDIX lb 

Exemplary com puter program for generating 
minimally cross hybridizing sets 
(single stranded tag/single stranded tag complement) 



Program tagN 
c 
c 
c 



Program tagN generates minimally cross-hybridizing 
c sets of subunits given i) N--subunit length, and ii) 

c an initial subunit sequence. tagN assumes that only 

C 3 Of the fOUr natural nnr 1 AAf 1 - _ ^. i_ _ 



c 

c 



— — -3 — >- • w w - <_ ^ 4 t «_i ^> o vjiilc o LIJQ l uju v 

3 of the four natural nucleotides are used in the tags. 

characters subl(20) 
integer*2 msec ( 10000, 20) , nbase(20) 

c 

write (*,*) 'ENTER SUBUNIT LENGTH ' 

read(*, 100)nsub 
100 format (i2) 

c 
c 

write { * , * ) ' ENTER SUBUNIT SEQUENCE 1 
read<*, 110) (subl (k) , k=l,nsub) 
110 format (20al) 



c 

c 



800 continue 



c 



ndif f=10 



Let a=l c=2 g=3 & t=4 

do 800 kk=l,nsub 

if (subl (kk) .eq. 'a') then 

mset (1, kk)=l 

endif 

if (subl (kk) .eq. 'c* } then 
mset (1, kk)=2 
endif 

if (subl (kk) .eq. 'g' ) then 
mset (1, kk) =3 
endif 

if (subl ( kk} . eq. * t * ) then 
mset ( 1, kk) =4 
endif 



c Generate set of subunits differing from 

c subl by at least ndiff nucleotides. 



do 1000 ki=*l,3 
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do 1000 k2=l,3 
"---do 1O00 k3=l,3 " ■ "* 

do 1000 k4 = l,3 
do 1000 k5=l, 3 
do 1000 k6=l, 3 
do 1000 k7=l,3 
do 1000 k8=l,3 
do 1000 k9=l,3 
do 1000 kl0=l,3 

do 1000 kll=l,3 
do 1000 kl2=l, 3 
do 1000 k!3=l,3 
do 1000 k!4=l, 3 
do 1000 kl5=l,3 
do 1000 k!6=l, 3 
do 1000 kl7=i, 3 
do 1000 kl8=l, 3 
do 1000 kl9=l, 3 
do 1000 k20=l,3 



nbase (l)=kl 
nbase (2)=k2 
nbase (3)=k3 
nbase ( 4 ) =k4 
nbase (5) =k5 
nbase (6)=k6 
nbase(7)=k7 
nbase (8) =k8 
nbase (9) =k9 
nbase (10)=kl0 
nbase (ll)=kll 
nbase(12)=kl2 
nbase (13)=kl3 
nbase (14 )=kl4 
nbase(15}=kl5 
nbase (16) =kl6 
nbase(17)«kl7 
nbase(18)=kl8 
nbase(19)=kl9 
nbase (20)=k20 



do 1250 nn=l, j j 
n=0 

do 1200 j=l,nsub 

if (mset (nn,j) .eq.l .and. nbase ( j ). ne . 1 or 
1 mset <nn, j) .eq.2 .and. nbase ( j ) . ne . 2 or 

* mset (nn, j) .eq.3 .and. nbase ( j ) . ne . 3 or 

mset(nn, j).eq.4 .and. nbase ( j ) . ne . 4 ) then 

n=n+l 

endif 

1200 continue 
c 

c 

if (n.lt.ndiff ) then 
goto 1000 
endif 

1250 continue 
c 

write <* f 130) (nbase (i ) , i-i , nsub) , i j 
do 1100 i=l,nsub 
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roset ( j j , i ) =nbase (i )- 

1100 continue 

c 

1000 continue 

c 

c 

write ( * , * ) 
130 format ( lOx, 20 ( lx, i I ) , Sx, i 5 ) 

write (*, *) 

write(*, 120) jj 
120 format (lx, 'Number of words=',i5) 

c 
c 

end 

c 
c 
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APPENDIX lc ~ 

Exemplary comput er pro gram f or generatin g 
minimally cross hybridizing sets 
(double stranded tag/single stranded tag complement) 



Program 3tagN 
c 
c 
c 
c 
c 
c 
c 



P ™2"™ ^H 9N , 9ener ! teS minimall y cross-hybridizing 
sets of duplex subunits given i) N-subunit length, 
and 11) an initial homopurine sequence. 

character^ subl{20) 

integers mset { 10000, 20 > , nbase(20) 

c 

write ( *, * ) * ENTER SUBUNIT LENGTH 1 

read(*, 100)nsub 
100 format (i2) 

c 
c 

writer,*) 'ENTER SUBUNIT SEQUENCE a & o only' 

read(MlO)(subl(k),k=l,nsub) ' V 

110 format (20al) 



c 



ndif f«io 



Let a=l and g=2 



do 800 kk=l,nsub 

if (subl(kk) .eq. 'a' ) then 

mset (1, kk)=l 

endif 

if {subl (kk) .eq. 'a' ) then 
mset (1, kk)=2 
endif 

800 continue 



do 1000 kl = l, 3 
do 1000 k2=l, 3 
do 1000 k3=l, 3 
do 1000 k4«l,3 
do 1000 k5=l,3 
do 100C k6=l, 3 
do 1000 k7=i, 3 
do 1000 k8=l, 3 
do 1000 k9=l, 3 
do 1000 kl0=l, 3 

do 1000 kll=l, 3 
do 1000 kl2=l, 3 
do 1000 kl3-l, 3 
do 1000 kl4=l, 3 
do 1000 kl5=l,3 
do 1000 kl6=l, 3 
do 1000 kl7=i,3 
do 1000 k!8=l, 3 
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do 1000 kl9=is3 
do 1000 k20=l,3 



nbase (1)= 
nbase (2)= 
nbase (3) = 
nbase (4 ) - 
nbase (5) = 
nbase ( 6) = 
nbase (7 ) = 
nbase (8 ) = 
nbase ( 9 ) = 
nbase (10) 
nbase ( 11 ) 
nbase (12) 
nbase (13) 
nbase ( 14 ) 
nbase (15) 
nbase (16) 
nbase (17) 
nbase (18) 
nbase (19) 
nbase (20) 



kl 
k2 
k3 
k4 
k5 
k6 
k7 
k8 
k9 

=kl0 
= kll 
= kl2 
=kl3 
=kl4 
-kl5 
=kl6 
=kl7 
=klS 
= k!9 
= k20 



L200 



do 1250 nn=l, j j 
n=0 

do 1200 j=l,nsub 
if (mset (nn, j J . eq 
mset (nn, j ) .eq, 
mset (nn, j J . eq . 
mset (nn, j ) .eq, 
n=n+ 1 
endi f 
continue 



.and. nbase (j ). ne. 1 .or. 
. and . nbase ( j ) . ne . 2 .or. 
.and. nbase ( j ) . ne . 3 .or. 
.and. nbase ( j ) . ne. 4 ) then 



1250 



if (n.lt .ndiff ) 
goto 1000 
endif 

cont inue 



then 



1100 
c 

1000 
c 

130 



120 



write (*, 130) (nbase (i ) , i=l, nsub; , j j 
do 1100 i=l,nsub 

mset ( j j , i ) =nbase (i ) 
cont inue 

continue 
write ( * , * ) 

format ( lOx, 20 ( Ix, il ) , 5x, i5 ) 
write (*, *) 
write(*,120) jj 

format { lx, ' Number of words=',i5) 



end 
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SEQUENCE LISTING — 

(1) GENERAL INFORMATION: 

(i) APPLICANT: David W. Martin, Jr. 

^iJt7o e ?LSS M: °< Session profiles ln 

(iii) NUMBER OF SEQUENCES: 7 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Stephen C. Macevicz Lvn* Th««« _ • 

(BJ STREET : 3832 Bay Center Pllce V Therapeutics, Inc. 

(C) CITY: Hayward 

(D) STATE: California 

(E) COUNTRY: USA 

(F) ZIP: 94545 

(v) COMPUTER READABLE FORM * 

(A) MEDIUM TYPE : 3.5 inch diskette 

(B) COMPUTER: IBM compatible 

<C) OPERATING SYSTEM: Windows 3 1 
(D) SOFTWARE: Microsoft Word 5.1 

. (vij CURRENT APPLICATION DATA- 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 

(C) CLASSIFICATION: 

(vii) PRIOR APPLICATION DATA: 

CA) APPLICATION NUMBER: PCT/US96 /095 1 "3 
(3} FILING DATE: 06-JUN-96 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: PCT/US95/1P7Q1 

(B) FILING DATE: 12-OCT-95 

(viii) ATTORNEY /AGENT INFORMATION- 

(A) NAME: Stephen C. Macevicz 

(B) REGISTRATION NUMBER- 30 285 

(C) REFERENCE /DOCKET NUMBER: 8l3wo 

Ux) TELECOMMUNICATION INFORMATION* 
(A) TELEPHONE: (510) 670-9365 
(Bj TELEFAX: (510) 670-9302 



(2) INFORMATION FOR SEQ ID NO: 1: 

U) SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 11 nucleotides 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1: 



CTAGTCGACC A 



(2} INFORMATION FOR SEQ ID NO: 2: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 11 nucleotides 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 



NRRGATCYNN N 



(2) INFORMATION FOR SEQ ID NO: 3: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 38 nucleotides 
(3) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

( D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 



GAGGATGCCT TTATGGATCC ACTCGAGATC CCAATCCA 



(2) INFORMATION FOR SEQ ID NO: 4: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 nucleotides 
(3) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 



AGTGGCTGGG CATCGGACCG 



(2) INFORMATION FOR SEQ ID NO: 5: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 nucleotides 

(B) TYPE: nucleic acid 
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(C) STRANDEDNESS: double 
- (Di ' TOPOLOGY :~ linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 



GGGGCCCAGT CAGCGTCGAT 



20 



(2) INFORMATION FOR SEQ ID NO: 6: 



(i) SEQUENCE CHARACTERISTICS • 

(A) LENGTH: 20 nucleotides 

(B) TYPE: nucleic acid 
<CJ STRANDEDNESS: single 
(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 



ATCGACGCTG ACTGGGCCCC 



16 



(2) INFORMATION FOR SEQ ID NO: 7; 



(i) SEQUENCE CHARACTERISTICS * 

(A) LENGTH: 62 nucleotide* 
(B> TYPE: nucleic acid 
(C) STRANDEDNESS: double 
{D} TOPOLOGY: linear 



(xi ) SEQUENCE DESCRIPTION: SEQ ID NO: 7, 



AAAAGGAGGA GGCCTTGATA GAGAGGACCT GTTTAAACGG ATCCTCTTCC 



TCTTCCTCTT C r 
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"" " I claim: 

1 . A method of determining the toxicity of a compound, the method comprising 
the steps of: 

5 administering the compound to a test organism; 

extracting a population of mRNA molecules from each of one or more tissues 
of the test organism; 

forming a separate population of cDNA molecules from each population of 
mRNA molecules from the one or more tissues such that each cDNA molecule of a 
0 separate population has an oligonucleotide tag attached, the oligonucleotide tags 
being selected from the same minimally cross-hybridizing set; 

separately sampling each population of cDNA molecules such that 
substantially all different cDNA molecules within a separate population have different 
oligonucleotide tags attached; 

sorting the cDNA molecules of each separate population by specifically 
hybridizing the oligonucleotide tags with their respective complements, the respective 
complements being attached as uniform populations of substantially identical 
complements in spatially discrete regions on one or more solid phase supports; 

determining the nucleotide sequence of a portion of each of the sorted cDNA 
molecules of each separate population to form a frequency distribution of expressed 
genes for each of the one or more tissues; and 

correlating the frequency distribution of expressed genes in each of the one or 
more tissues with the toxicity of the compound. 

2. The method of claim 1 wherein said oligonucleotide tag and said complement 
of said oligonucleotide tag are single stranded. 

3. The method of claim 2 wherein said oligonucleotide tag consists of a plurality 
of subunits, each subunit consisting of an oligonucleotide of 3 to 9 nucleotides in 
length and each subunit being selected from the same minimally cross-hybridizing set. 

4. The method of claim 3 wherein said one or more solid phase supports are 
microparticles and wherein said step of sorting said cDNA molecules onto the 
microparticles produces a subpopulation of loaded microparticles and a subpopulation 
of unloaded microparticles. 

5. The method of claim 4 further including a step of separating said loaded 
microparticles from said unloaded microparticles. 
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6. The method of claim 5 fiirther including a step of repeating said steps of 
sampling, sorting, and separating until a number of said loaded microparticles is 
accumulated is at least 10,000. 

5 

7. The method of claim 6 wherein said number of loaded microparticles is at 
least 100,000. 

8. The method of claim 7 wherein said number of loaded microparticles is at 
0 least 500,000. 

9. The method of claim 5 further including a step of repeating said steps of 
sampling, sorting, and separating until a number of said loaded microparticles is 
accumulated is sufficient to estimate the relative abundance of a cDNA molecule 

5 present in said population at a frequency within the range of from 0. 1 % to 5% with a 
95% confidence limit no larger than 0. 1% of said population. 

10. The method of claim 4 wherein said test organism is a mammalian tissue 
- culture. 

11. The method of claim 1 0 wherein said mammalian tissue culture comprises 
hepatocytes. 

1 2. The method of claim 4 wherein said test organism is an animal selected from 
the group consisting of rats, mice, hamsters, guinea pigs, rabbits, cats, dogs, pigs, and 
monkeys. 

13. The method of claim 1 2 wherein said one or more tissues are selected from the 
group consisting of liver, kidney, brain, cardiovascular, thyroid, spleen, adrenal, large 
intestine, small intestine, pancrease urinary bladder, stomach, ovary, testes, and 
mesenteric lymph nodes. 



14. A method of identifying genes which are differentially expressed in a selected 
tissue of a test animal after treatment with a compound, the method comprising the 
steps of: 

administering the compound to a test animal; 
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extracting a population of mRNA molecules from the _selected tissue of the 
test animal; 

forming a population of cDNA molecules from the population of mRNA 
molecules such that each cDNA molecule has an oligonucleotide tag attached, the 
5 oligonucleotide tags being selected from the same minimally cross-hybridizing set; 

sampling the population of cDNA molecules such that substantially all 
different cDNA molecules have different oligonucleotide tags attached; 

sorting the cDNA molecules by specifically hybridizing the oligonucleotide 
tags with their respective complements, the respective complements being attached as 
1 0 uniform populations of substantially identical complements in spatially discrete 
regions on one or more solid phase supports; 

determining the nucleotide sequence of a portion of each of the sorted cDNA 
molecules to form a frequency distribution of expressed genes; and 

identifying genes expressed in response to administering the compound by 
1 5 comparing the frequencing distribution of expressed genes of the selected tissue of the 
test animal with a frequency distribution of expressed genes of the selected tissue of a 
control animal. 

* 1 5. The method of claim 14 wherein said oligonucleotide tag and said 
20 complement of said oligonucleotide tag are single stranded. 

16. The method of claim 1 5 wherein said oligonucleotide tag consists of a 
plurality of subunits, each subunit consisting of an oligonucleotide of 3 to 9 
nucleotides in length and each subunit being selected from the same minimally cross- 

25 hybridizing set. 

1 7. The method of claim 1 6 wherein said one or more solid phase supports are 
microparticles and wherein said step of sorting said cDNA molecules onto the 
microparticles produces a subpopulation of loaded microparticles and a subpopulation 

30 of unloaded microparticles. 

1 8. The method of claim 1 7 further including a step of separating said loaded 
microparticles from said unloaded microparticles. 

35 19. The method of claim 18 further including a step of repeating said steps of 
sampling, sorting, and separating until a number of said loaded microparticles is 
accumulated is at least 10,000. 
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- -20. .. . The method of daim 19 wherein said number of loaded microparticles is at 
least 100.000. — 

21. The method of claim 20 wherein said number of loaded microparticles is at 
least 500,000. 

22. The method of claim 18 further including a step of repeating said steps of 
sampling, sorting, and separating until a number of said loaded microparticles is 
accumulated is sufficient to estimate the relative abundance of a cDNA molecule 
present in said population at a frequency within the range of from 0.1% to 5% with a 
95% confidence limit no larger than 0. 1% of said population. 

23. The method of claim 17 wherein said test animal is selected from the group 
consisting of rats, mice, hamsters, guinea pigs, rabbits, cats, dogs, pigs, and monkeys. 

24. The method of claim 23 wherein said selected tissue is selected from the 
group consisting of liver, kidney, brain, cardiovascular, thyroid, spleen, adrenal, large 
intestine, small intestine, pancrease urinary bladder, stomach, ovary, testes, and 

♦ mesenteric lymph nodes. 

25. A use of the technique of massively parallel signature sequencing to determine 
the toxicity of a compound in a test organism, the use comprising the steps of: 

administering the compound to a test organism; 

extracting a population of mRNA molecules from each of one or more tissues 
of the test organism and forming a population of cDNA molecules for each of the one 
or more tissues; 

determining the nucleotide sequence of a portion of each of the cDNA 
molecules of each separate population using massively parallel signature sequencing 
to form a frequency distribution of expressed genes for each of the one or more 
tissues; and 

correlating the frequency distribution of expressed genes in each of the one or 
more tissues with the toxicity of the compound. 



26. The use of claim 25 wherein said test organism is a mammalian 



tissue culture. 



27. The use of claim 26 wherein said mammalian tissue culture comprises 
hepatocytes. 
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28." " The usV'df claim 25 wherein said test "organism is an animal selected from the 
group consisting of rats, mice, hamsters, guinea pigs, rabbits, cats, dogs, pigs, and 
monkeys. 



29. The use of claim 28 wherein said one or more tissues are selected from the 
group consisting of liver, kidney, brain, cardiovascular, thyroid, spleen, adrenal, large 
intestine, small intestine, pancrease urinary bladder, stomach, ovary, testes, and 
mesenteric lymph nodes. 



0 30. A use of the technique of massively parallel signature sequencing to identify 
genes which are differentially expressed in a test organism after treatment with a 
compound and which are correlated with toxicity of the compound, the use 
comprising the steps of: 

administering the compound to the test organism; 
5 extracting a population of mRNA molecules from a selected tissue of the test 

organism and forming a population of cDNA molecules; 

determining the nucleotide sequence of a portion of each of the cDNA 
molecules using massively parallel signature sequencing to form a frequency 
- distribution of expressed genes; 
0 identifying genes expressed in response to administering the compound by 

comparing the frequencing distribution of expressed genes of the selected tissue of the 
test organism with a frequency distribution of expressed genes of the selected tissue 
of a control organism; and 

determining whether the genes expressed in response to administering the 
compound are correlated with toxicity of the compound in the test organism. 
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Pharmagene 
Raises More 
Capital for 
Research on 
Human 
Tissues 

By Sophia Fox 

irmageae, the Royston. 
K.-based biopharmaceutt- 
cal company specialising in 
the use of human btomaterials for 
drug discovery research, has raised a 
further £5 million from a group of 
investors led by 3i and Abacus 
Nominees. The funding win enable 
the company to expand both its 
human biornaterials coflectjan and 
its capabilities across a range of pro- 
prietary platform technologies. 

Gordon Baxter. Ph.D., 
Pharmagene s cofounder and chief 
operating officer, claimed 'Try the 
end of this year Pharmagene will 
have access to the largest collection 
of human RNAs and proteins any- 
where in the world, and a range of 
innovative, yet robust technologies 
SEE PHARMAGENE, P. • 
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Perldn-EImer Acquires PerSeptive to Expand 
Its Capabilities in Gene-BaseaDmg Discovery 



By John Sterling 

PerMo-Elmer 1 * (PE; Norwalk, 
CT) decision last month to 
acquire PtrSeptfvt Bfo- 
tystems (framingham, MA) via a 
$360 million stock swap was 
designed to strengthen PE in terms 
of broad capabilities in gene-based 
drug discovery. The company* 
main goal is to develop new prod- 
ucts to improve the integration of 
genetic and protein research. 

This merger will enhance our 
position as an effective provider of 
innovative, integrated platforms 
enabling our customers to be more 
efficient and cost-effective in bring- 
ing new pharrriaceuticals to mar- 
ket" says Tony L. White, PEs 
chairman, president and CEO. 'The 
combination of our two companies 
should bolster our presence in the 
life sciences, [and it is our] belief 
that we must take bold action now 
to lead the emerging era of molecu- 
lar medicine with leading positions 
in both genetic and protein analy- 
sis.** 

A driving force behind the 
merger is the vast amount of genet- 



FDA OKs Genzyme's Carticel 
Product for Damage to Knees 



- Periostea! flap 



Grnzyme Ti«ui* Rep 



Coll Protesting 



-a 



CartkeL *4iich was approved for the repair of clinically significant, symp- 
tomatic cartilaginous defects of the femoral condyle (media!, lateral or 
trochlear) caused by acute or repetitive trauma, employs a proprietary 
process to grryw autologous cartilage cell* for implantation. 



By Naomi PfeuTer 

The FDA has approved a knee- 
cartilage replacement product 
made by Genzyme Tissue 
Repair (Cambridge, MA), a track- 
ing-stock division of Genzyme 
Corp., for people with trauma- 
damaged knees. 

CaniceP (autologous cultured 
chondrocytes) is the first product to 
be licensed under the FDAs prc- 
SEE GENZYME, P. 6 
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PerSepthr 

Biosystems 
for $360 
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obtain new 
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ic information about human dis- 
ease that » being accumulated by 
researchers and biotcch companies 
working in the area of genomics. It 
is becoming increasingly obvious 
that these data need to be comple- 
mented with technologies for 



specovtne~ 

try, biosepa- 
mtions and 
purification 
| for product 
development 
projects, 
spanning the 
range from 
genomics to 
proteomicx. 



studying proteins and protein net- 
works — a field known as oro- 
leomics GEN. ScpttmbtT I. 
1997. p. n 

PE officials, who claim thai 
MALDI-TOF (Matrix Assisted 
SEEACOUtSmON.P.10 



Strategies for Target Validation 
Streamline Evaluation of Leads 



ByVkkJ Glaser 

cacia Biosciences (Rich- 
mond CA) last month 
its first agree- 
ment with a major pharmaceutical 
company, signing a deal with Eli 
Litty (Indianapolis, TN) to use 
Acacias Genome Reporter Matrix 
(GRM) to select and optimize some 
of Lillys lead ampounds. Acacias 
yeast-based system for profiling 
drug activity is useful for evaluating 
the therapeutic potential of lead 
compounds, and it also has a role in 
the identification and validation of 
new drug targets. 

"We're using the ecosystem of a 
cell to allow us to deduce the mech- 
anism of action and target for any 
chemical." explains Bruce Cohen, 
president and CEO. "We screen for 
every target in a cell simultaneous- 
ry... using transcription as a readout 



for how a cell is adapting to any 
perturbation," he says. 

The GRM technology consists of 
two main databases: one is the 
genetic response profile, showing 
the effects of mutations in each 
individual yeast gene and compen- 
satory gene regulatory mecha- 
nisms; the other is die chemical 
response profile, which documents 
changes in gene expression in 
response to chemical compounds. 
Computational analysts and pattern 
matching between the genetic and 
chemical profiles yields informa- 
tion on the specificity, potency and 
side-effects risk of a drug lead 

Targeting Tarjpts 

No longer is mapping and 
sequencing a gene — or the human 
genome — an end unto itself, but 
SEE TARGET, P. 18 



Sticky Ends 



Avlgen received two 
grant a from the NIH s. 
University of Cali- 
fornia for research 
on gene therapy for 
treatment of cancer U 
HIV infections. . .KRL 
Pharmaceutical Servi- 
ces, of Reaton, VA, 
launched the TSN Bug 
Finder, which is able 
to locate £ retrieve 
client -specified mi- 
croorganisms in real- 
time. . .Oensla Sicor, 
Inc. will move its 
corporate staff from 
San Diego to Irvine, 
CA, by end of year... 



FDA accepted NDA from 
Sepracor for levalbu- 
terol HC1 inhalation 
solution. . .An S11.7M 
mezzanine financing 
has been closed by 
Activated Cell Thera- 
py, which changed its 
name to Dendreon Cor- 
poratlon. . .Astra Afi 
will build major re- 
search facility in 
Walt ham, MA, and is 
also relocating Astra 
Aroua research facil- 
ity from Rochester to 
Boston area. . .Prolif- 
ic Ltd. team used a 
small peptide to in- 
hibit the E2F protein 
complex and induced 



apoptosis in mammali- 
an tumor cells — Ver- 
tex Pharmaceutical a , 
Inc. and Alpha Thera- 
peutic Corp. ended an 
agreement to develop 
VX-366 for treatment 
of inherited hemoglo- 
bin disorders. . .Mavl- 
Cyte received Phase I 
SBIR grant for up to 
$100,000 from NIH for 
development of proto- 
type of ice NavlFlow 
technology for high- 
throughput screening 
. . .Covu.ce Inc. will 
invest $21 million in 
expansion and renova- 
tion of its facility 
in Indianapolis, IN. 
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merely a weans to an end The criti- 
cal next step is to validate the gene 
_ _ond its protein product as. a potential 
drug target. The Human Genome 
Project continues to produce a trea- 
sure chest of expressed sequence 
tags ( ESTs ) and a tantalizing array or 
complete gene sequences. 

Companies are applying a variety 
of functional genomic strategies to 
link genes to specific diseases and to 
multigenic phen u cypes. Yet the ulti- 
mate challenge for pharmaceutical 
companies is to sift through all the 
sequence and differential gene 
expression data to identify the best 
targets for drug discovery. 

Spinning off technology devel- 
oped at the University of North 
Carolina (Chapel Hill). Cytogen 
Corp. (Princeton, NJ) formed its 
wholly owned subsidiary AiCetl 
Bio sc ience s earlier this year. The 
young company is building a protein 
interaction database, cataloging all 
the interactions the modular domains 
of proteins can engage in with a 



range of ligands, in order to gain 
insight into protein function and to 
select the most critical interaction to 
target for drug devdonment 

AxCcll a cloning^ligand-targcts 
(COLT) technology employs "recog- 
nition units** Jton>. the company's 
genetic diversity library (GDU to 
map functional protein interactions 
and quantitate their affinity. The 
company 5 inter-functional nrot com- 
ic database (IFP-dbasc) elucidates 
protein interaction networks and 
structure-activity relationships based 
on tigand affinity with protein mod- 
ular domains. 

Defining Disease Pathways 

Signal Pharmaceuticals, lac's 
(San Diego, CA) integrated drug tar- 
get and discovery effort is based on 
mapping gene^regulating pathways in 
cells and identifying smalt molecules 
that regulate the activation of those 
genes. In collaboration with academ- 
ic researchers, the company has iden- 
tified a large number of regulatory 
proteins in several mitogen-activated 
protein (MAP) kinase pathways 
(including the JNK, FRK and p38 
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The Genome 
Reporter 
Matrix depicts 
axufvetqfu 
yeast artxty. 
Each colony 
harbors a (jFP* 



xtnnlfar a 
single gem*. 
Cotkxiiveiy. the 
array reports 
the expression 
nfaflycaxt 
fSLfiex. 

A:Arravin \tii- 
h/e light. 
B: Image of flu- 
orescent emis- 
sion from the 
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Acacia 



signaling pathways), which Signal is 
evaluating for the treatment of I 
autoimmune, inflammatory, cardio- 
vascular and neurologic diseases, and J 
cancer. Other target identification I 



programs focus on the NF-kB path- 
way, estrogen-related genes and ccn- 
tralnperiphenil nervous system genes. 

Regulating cytokine production in 
immune and inflammatory disorders. 




A strong chemical combination to help you grow. And flourish. 

Three hundred million dollars and ten years of hard work. That's what it costs tn bring; unir biotechnology- 
ilcrivvd tlK-'rapeulic in the marketplace 
Which moans, no room for error. 

Which means, in lum, you'd be wise to tap into the combined capabilities of Mallinckrodt and JT.Baker: 
dual sources, trusted names for your chemical raw materials, 

Two separate GMP-produced brands offering the control of a single quality system and the comrnience of a 
single audit process. 

We offer compreliensivt product lines Including USP salts, bioreagents. high purity solvents and 
chromatography products in Beaker to Bu Ik" packaging for easy scale-up. 

(iill I-80O-582-2S37. or access our wfeitc at http^A\>*w.mallhaker.com. Kor dual chemical sources dedicated 
to helping you grow. Flourish. Succeed! 
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and modifying bone metabolism to 
treat osteoporosis arc the focus of 
Signals collaboration with Tanabc 
Sefyaku (Osaka. Japanl Signal has 
partnered with Organon/Akzo 
Nonet (Netherlands) to identify 
estrogen-responsive genes as targets 
for treating neurodegeneranve and 
r*ychiatric diseases, at h erosclero s is 
and ischemia, and with Roche 
Bioscience < Palo Alio, CA) lo devel- 
op human ncrinhcral nerve cell lines 
for the discovery of treatments for 
pain and incontinence. 

Exrtixb' (S. San Francisco. CA) 
strategy for target "selection is to 
define disease pathways and identity 
regulatory molecules that activate or 
inhibit those biochemical/genetic 
pathways. Based on the finding that 
these pathways are conserved across 
species, the company is studying the 
model genetic systems of Drosopnila 
and Caenorhabditis elegant. Using 
its PathFinder technology, Exdixis 
systematically introduces mutations 
into the genomes of these model 
organisms, looking for mutations 
mat enhance or suppress the target 
disease-related gene. These novel 
genes then become the basis of drug 
screening assays. 

Cadus Pharmaceutical Corp. 
(Tarrytown, NY) is identifying sur- 
rogate ligands to newly discovered 
orphan G-protein coupled trans- 
membrane receptors of unknown 
function to determine the suitability 
of the receptors as drug targets. 
Inserting the novel receptor m a 
yeast system yields a ligand that 
activates the receptor. Access to a 
surrogate ligand allows the company 
to screen for receptor antagonists in 
the yeast system. 

"The antagonist plus the surro- 
gate ligand gives you two probes — 
an on probe and an off probe — 
which allows you to look at func- 
tion;* explains David Vfcbb, Ph.D.. 
vp of research and chief scientific 
officer. A surrogate ligand also pro- 
vides information on which G-pro- 
tcin interacts with the orphan recep- 
tor and its associated signaling path- 
ways, further clarifying the rote of 
the receptor as a potential drug tar- 
get. Cadus" collaboration with 
Smith Kline (Philadelphia) capital- 
izes on Cadus' ability to determine 
orphan receptor function, applying 
the technology to SmitMCIine b pro- 
prietary, newly discovered G-pro- 
tein receptors. 

Cadus' recombinant yeast system 
can also be used to screen cell and 
tissue extracts for natural ligands, 
ami the company is accelerating its 
internal drug-discovery efforts in the 
areas of cancer, inflammation and 
allergy. A recent equity investment in 
Axiom Biotechnologies (San Diego, 
CA) gave Cadus a license to Axiom s 
high-throughput pharmacologic 
screening system for lead optimiza- 
tion and discovery. 

As its name implies, 
gene/Networks (Alameda, CA) 
focuses on identifying gene networks 
that contribute to multigenic pheno- 
rypes and complex disease process- 
es. The integration of mouse and 
human genetic studies forms the 
basis of the technology. The Genome 
Tagged Mice database in develop- 
ment will serve as a library of natur- 
al mouse genetic and phenotypic 
variation. Disease-related genes 
identified in mice are then evaluated 
in human family- and population- 
based studies to confirm (heir clini- 
cal relevance and linkages to patho- 
physiologic tmils. 

Blocking Gene Expression 

Inactivating a gene known to be 
expressed in ;tssociation with a par- 
ticular disease is one approach to 
identifying appropriate therapeutic 
targets. The target validation and dis- 
covery program at Rlbozyme 
Pharmaceuticals, Inc. ( Boulder. 
CO) applies the company's ribewyme 
technology to achieve selective inhi- 
bition of gene expression in cell cul- 
tua* ami in animals. 

Correlation of the gene expres- 
moii inhihiiiim with phenotype can 
SEE TARGET, P. 38 
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suggest the rcbtivc importance of 
_ -ihc. gsntin disease pathology. The 
company b nuclcase-resistant 
ribozymes form the basis of a col- 
laboration with Scbeiing AG 
(Germany) for drug target validation 
and the development of ribcayrne- 
based therapeutic agents, and with 
Chiron Corp, (Emeryville, CA) for 
target validation. 

With several antisense compounds 
now progressing through clinical tri- 
als, the concept of using oligonu- 
cleotides to inhibit gene activity is 
not new. But rather than focusing on 
uierapeuncs development, Sequitar, 
Inc. (Natick, MA) is creating anti- 
sense compounds for the purpose of 
ueterrnining gene function and vali- 
dating drug targets. Clients typically 
provide the one-year-old company 
with the sequence (or EST) of a 
potential gene target and, in return. 
Sequitur custom designs a series of 
three to six antisense compounds that 
yield a three-to- ten-fold inhibition of 
the target gene in cell culture. The 
company also provides oligofectins, 
a series of canonic lipids, to deliver 
the oligonucleotides to a variety of 
cultured cells. 

'T)ifferentiaJ expression informa- 
tion is just for correlation, it doesn't 
tell function or confirm what would 
be a good target," says Tod Woolf, 
Ph.D., director of technology devel- 
opment at Sequitur. Whereas, anti- 
sense compounds will inhibit a tar- 
get Sequitur offers both phospho- 
rothioate DNA antisense com- 
pounds, and its proprietary Next 
Generation chimeric oligonu- 
cleotides, which have a higher 
hybridization affinity, greater speci- 
ficity and reduced toxicity, according 
to the company. 

Mining Pathogen Genomes 

Companies such as Human 
Genome Sciences (HGS; Rockvillc, 
MD). Incyte (Palo Alto. CA), \ 




AxCetl Biosciences scientists say their technology enables the rapid and 
simple functional identification of the n*o essential molecular components 
of protein interaction networks: specific recognition units that bind distinct 
modular protein domains are identified and isolated using a combination 
strycturulffunchonal approach that uses both peptide phase display Genetic 
Divasit y Ubrvries (GDI) and bioinformatics, and cloning of Ligand 
Targets (COLT) technology utilizes recognition units as functional probes to 
isolate families of mteractor proteins. 



MUknniuxn PbarauceurJcab Inc. 
(Cambridge, MA) and Genome 
Therapeutics (Wahham, MA) are 
relying on high-speed DNA sequenc- 
ing, positional cloning and other 
strategies to identify specific micro- 
bial genomic sites that would be 
good targets for infectious disease 
therapeutics. 

HGS recently completed sequenc- 
ing of the bacterial pathogen 
Streptococcus pneumoniae, which is 
the focus of an agreement with 
Hoffmann-La Roche (Basel, 
Switzerland). Roche will use the 
sequence data to develop new anti- 
infectives against S. pneumoniae. 
HGS and Roche have expanded their 
collaboration to include a nonexclu- 
sive license to access sequence infor- 
mation for the intestinal bacterium 
Enterococcus faecalis. 

Incyte Pharmaceuticals has com- 
pleted one- fold coverage of the 
Candida albicans genome, identify- 



ing 60% of the genes of this fungal 
pathogen. This genome will become 
part of the company's PathoSeq 
microbial database. Incyte recently 
introduced the ZooScq animal gene 
sequence and expression database. 
The database will provide genomic 
information across various species 
commonly used in preclinical drug 
testing, which may help to better 
define potential drug targets. 

Millennium Pharmaceuticals con- 
tinues to report success in identifying 
novel drug targets, having recently 
discovered a novel ehemokine called 
neurotactin and a new class of MAD- 
related proteins that inhibit trans- 
forming growth factor beta (TGF-fl) 
signaling. The company also 
received U.S. patent coverage for the 
nib genes, believed to play a role in 
obesity, and for the gene that encodes 
the protein melastatin, which appears 
to suppress metastasis in malignant 
melanoma. ■ 




HIGH SPECIFIC ACTIVITY 
MICROBIAL ALKALINE 
PHOSPHATASE 
from Biocatalysts 

Biocatalysts Limited, the British speciality enzyme 
company, has developed a completely new type of 
alkaline phosphatase with many advantages over the 
types most commonly used. 
It Is of microbial origin with a high specific activity 
(unlike that from £ coB) and with higher temperature and 
storage stability compared to that from caff intestine. 
This is the first of several new generation diagnostic 
enzymes being developed by Biocatalysts United with 
greatly improved stability. 

• Non-animal source, no risk of BSE or animal 
virus contamination 

• Higher temperature stability than carf Intestine 

• Much higher specific activity than from E. coll 

• Very high storage stability oven in the absence 
of glycerol 

fa further details on alkaline phosphate 
diagnostic enzymes contact us direct at the address below or 
within North America contact our US Distributor Kaltron-Pettibone 
'phone: 630 350 1116 or fax 630-350-1606 
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Smith, now a computer program- 
mer, is an expert in systems integra- 
tion, internet technologies and the 
application of industrial engineering 
principles to the drug discovery 
process. Before co-founding Pangea, 
be was the manager of software 
development at Attorney s Briefcase, 
a legal research software company. 

By being "in the trenches" with 
customers and collaborators, 
Bellenson and Smith sensed the 
frustration of pharmaceutical 
researchers whose incompatible 
tools have impeded their progress. 
According to BeUenson, "Most of 
them axe geared toward analyzing 
one molecule at a rime. Its like emp- 
tying the ocean with an eye drop- 
per — an incompatible eye dropper at 
that. A pharmaceutical company 
may have 30 different drug discov- 
ery teams with various approaches. 
The problem is to manage the 
process of experimenting with a lot 
of different approaches, to automate 
while maintaining flexibility." 

Gene World 2.1 enables 'integra- 
tion of the entire target discovery and 
validation process,'* Bellenson says. 
The commercial software package 
coordinates the entire process of 
sequence-data analysis and can be 
integrated with other programs and 
databases, according to Smith, who 
adds that it handles thousands of 
sequence results, organizes and auto- 
mates annotation and seamlessly 
interacts with growing genome data- 
bases. Simple forms and menus 
enable users to turn raw sequence 
data into crucial knowledge for drug 
discovery by applying algorithms to 
sequences, creating custom analysis 
strategies and producing useful 
reports, without the need for writing 
computer code. Gene World 2.1 runs 
on a variety of platforms and operat- 
ing systems. 

Pairing industrial relational data- 
base-management systems with a 
web- browser interface, Pangea's 
Operating System of Drug 
Discovery'" is an operwwrnputing 
framework that allows client/server 
and Java-enabled web-based tech- 
nologies to collect, organize and ana- 
lyze drug discovery information for 
pharmaceutical companies to simpli- 
fy and accelerate drug discovery. The 
technology unites automated 
genomics database analysis for drug 
target she selection, chemical infor- 
mation database analysis and large- 
scale combinatorial chemistry pro- 
ject management and high-through- 
put screening project management 
for drug lead efficacy analysis. 
Pangea officials maintain that these 
integrated elements provide a unified 
environment for chemists, biologists 
and others involved in the drug dis- 
covery process to work together with 
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Siotttfofflutictsts un design end 
saw Strategies, uichattheone 
shown bet*, that forward data 
through muttipie-step analyses 
logically and automatically. 
Researchers throughout yovi 
organization can appty the same 
Strategies to their own data. 



commercial and public domain 
software. 

Pangea's Operating System of 
Drug Discovery can accommodate 
Sybase, Oracle or Informix relation- 
al database-management systems 
and any version of UNIX. It absorbs 
new data formats, databases, algo- 
rithms and analysis paradigms into 
the automated workflow without 
software modifications. Netscape 
Navigator"* provides a friendly user 
interface from PC, Macintosh, and 
UNIX workstations. 

In the near term, Pangea plans to 
complete its bioinformatics core 
with two more programs. Gene 
Foundry, a sample tracking and 
workflow sequence package for 
DNA sequence and fragment infor- 
mation, will also offer interaction 
with robots, reagent tracking and 
troubleshooting. Gene Thesaurus, 
the other package is a "warehouse 
of bioinformatics data,** says 
Bellenson. ■ 
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GTAC Chairman, Professor 
Norman C. Nevin, said 1996 saw 
"four important developments": an 
increase in enquiries and submis- 
sions made to GTAC; an increase in 
the complexity of submitted proto- 
cols; a continuing shift from gene 
therapy for single-gene disorders 
toward strategies aimed at tumour 
destruction in cancer; and a growth 
in intcmarional sponsorship of U.K.. 
gene therapy trials. 

Since 1993. GTAC and its prede- 
cessor, the Clothier Committee, have 
approved 18 LUC gene therapy clini- 
cal trials (13 of which have been car- 
ried out), which are listed in the 
report. The disease areas targeted by 
these trials include severe combined 
trnmunodcfjciency (1 trial); cystic 
fibrosis (6), metastatic melanoma (2), 
lymphoma (2), neuroblastorna (\\ 
breast cancer (IX Hurler* syndrome 
( 1 1, cervical cancer ( I ), glioblastoma 



breast cancer, breast cancer with liver 
metastases, glioblastoma, malignant 
ascites due to gastrointestinal cancer 
and ovarian cancer. 

Copies of the GTAC thrid annual 
report are available from the GTAC 
Secretariat, Wellington House, 133- 
155 Waterloo Road, London SE1 
8UG, U.K. 

Coated Lenses Prevent PCO 

Scientists in the UK. say it may be 
possible to prevent posterior capsule 
opacification (PCO), a common 
complication following cataract 
surgery, by using the implanted poly- 
methylmethacrylate (PMMA) 
intraocular lens as a drug delivery 
system. PCO occurs in 30-50% of 
cataract surgery patients as a result of 
stimulated cell growth within the 
remaining capsular bag The condi- 
tion causes a decline in visual acuity 
and requires expensive laser treat- 
ment, thus negating the routine use of 
cataract surgery in underdeveloped 
countries, explains G. Duncan, at the 



Docket No.: PF-0300-3 CON 
USSN: 09/745,506 
Ref.No. 10 of 19 



' Fischer-We, Science 270. 1828 (1'9S5). ---*■* 

35. T. C. James and S. C. Elgin, Md. Ceil Bid. 6, 3862 
(1 986): R. Paro and D. S. Hogness. Proc. Natl. Acad. 
Set. U.S.A. 88, 263 (1991); B. Tschiersch et at., 
EMBO J. 1 3, 3822 (1 994); M. T. Madireddi et ai. , Cell 
87, 75 (1996); D. G. Stokes, K. D. Tartof, R. P. Perry, 
Proc. Natl. Acad. Sd. U.S. A 93, 7137 (1996). 

36. P. M. Palosaari ef a/.. J. Biol. Chem. 266, 10750 
(1991); A. Schmrtz, K. H. Gartemann, J. RedJer, E. 



Grund, R Bchenlaub, Appl. Environ. Microbiol. 58, 
4068 (1992); V. Snarma, K. Suvama. R. Mega- 
nathan. M. E. Hudspeth. J. Bacterid. 174. 5057 

(1992) ; M. Kanazawa et a!., Enzyme Protein 47, 9 

(1993) ; 2. L. Boynton, G. N. Bennet, F. B. Rudolph, 
J. Bacterid. 178, 3015 (1996). 

37. M. HoefaL Cell 77, 869 (1 994). 

38. W. Hendriksefa/., J. CeH Biochem. 59, 418 (1995). 

39. We thank H. Skaletsky and F. Lewitter for help with 



sequence analysis; Lawrence livermore National 
Laboratory tor the flow-sorted Y cosmid library; and 
P. Bain, A. Bortvin. A. de la Chapelle, G. Fink, K. 
Jegalian, T. Kawaguchi, E. Lander, H. Lodish, P. 
Matsudaira. D. Menke. U. RajBhandary, R. Reijo. S. 
Rozen, A. Schwartz, C. Sun, and C. TBford for com- 
ments on the manuscript. Supported by NIH. 

28 April 1997; accepted 9 September 1997 



Exploring the Metabolic and Genetic Control of 
Gene Expression on a Genomic Scale 

Joseph L DeRisi, Vishwanath R. Iyer, Patrick O. Brown* 

DNA microarrays containing virtually every gene of Saccharomyces cerevisiae were used 
to carry out a comprehensive investigation of the temporal program of gene expression 
accompanying the metabolic shift from fermentation to respiration. The expression 
profiles observed for genes with known metabolic functions pointed to features of the 
metabolic reprogramming that occur during the diauxic shift, and the expression patterns 
of many previously uncharacterized genes provided clues to their possible functions. The 
same DNA microarrays were also used to identify genes whose expression was affected 
by deletion of the transcriptional co-repressor TUP1 or overexpression of the transcrip- 
tional activator YAP1. These results demonstrate the feasibility and utility of this ap- 
proach to genomewide exploration of gene expression patterns. 



The complete sequences of nearly a dozen 
microbial genomes are known, and in the 
next several years we expect to know the 
complete genome sequences of several 
metazoans, including the human genome. 
Defining the role of each gene in these 
genomes will be a formidable task, and un- 
derstanding how the genome functions as a 
whole in the complex natural history of a 
living organism presents an even greater 
challenge. 

Knowing when and where a gene is 
expressed often provides a strong clue as to 
its biological role. Conversely, the pattern 
of genes expressed in a cell can provide 
detailed information about its state. Al- 
though regulation of protein abundance in 
a cell is by no means accomplished solely 
by regulation of mRNA, virtually all dif- 
ferences in cell type or state are correlated 
with changes in the mRNA levels of many 
genes. This is fortuitous because the only 
specific reagent required to measure the 
abundance of the mRNA for a specific 
gene is a cDNA sequence. DNA microar- 
rays, consisting of thousands of individual 
gene sequences printed in a high-density 
array on a glass microscope slide (i, 2), 
provide a practical and economical tool 
for studying gene expression on a very 
large scale (3-6). 

Saccharomyces cerevisiae is an especially 

Department of Biochemistry, Stanford University School 
of Medicine. Howard Hughes Medical Institute. Stanford, 
CA 94305-5428. USA. 

*To whom correspondence should be addressed. E-mail: 
pbrown®crngm.stanford.edu 



favorable organism in which to conduct a 
systematic investigation of gene expression. 
The genes are easy to recognize in the ge- 
nome sequence, cis regulatory elements are 
generally compact and close to the tran- 
scription units, much is already known 
about its genetic regulatory mechanisms, 
and a powerful set of tools is available for its 
analysis. 

A recurring cycle in the natural history 
of yeast involves a shift from anaerobic 
(fermentation) to aerobic (respiration) me- 
tabolism. Inoculation of yeast into a medi- 
um rich in sugar is followed by rapid growth 
fueled by fermentation, with the production 
of ethanol. When the fermentable sugar is 
exhausted, the yeast cells turn to ethanol as 
a carbon source for aerobic growth. This 
switch from anaerobic growth to aerobic 
respiration upon depletion of glucose, re- 
ferred to as the diauxic shift, is correlated 
with widespread changes in the expression 
of genes involved in fundamental cellular 
processes such as carbon metabolism, pro- 
tein synthesis, and carbohydrate storage 
(7). We used DNA microarrays to charac- 
terize the changes in gene expression that 
take place during this process for nearly the 
entire genome, and to investigate the ge- 
netic circuitry that regulates and executes 
this program. 

Yeast open reading frames (ORFs) were 
amplified by the polymerase chain reaction 
(PCR), with a commercially available set of 
primer pairs (8). DNA microarrays, con- 
taining approximately 6400 distinct DNA 
sequences, were printed onto glass slides by 



using a simple robotic printing device (9). 
Cells from an exponentially growing culture 
of yeast were inoculated into fresh medium 
and grown at 30°C for 21 hours. After an 
initial 9 hours of growth, samples were har- 
vested at seven successive 2-hour intervals, 
and mRNA was isolated (JO). Fluorescently 
labeled cDNA was prepared by reverse tran- 
scription in the presence of Cy3(green)- 
or Cy5 (red) -labeled deoxyuridine triphos- 
phate (dUTP) (11) and then hybridized to 
the microarrays (12). To maximize the re- 
liability with which changes in expression 
levels could be discerned, we labeled cDNA 
prepared from cells at each successive time 
point with Cy5, then mixed it with a Cy3- 
labeled "reference" cDNA sample prepared 
from cells harvested at the first interval 
after inoculation. In this experimental de- 
sign, the relative fluorescence intensity 
measured for the Cy3 and Cy5 fluors at 
each array element provides a reliable mea- 
sure of the relative abundance of the corre- 
sponding mRNA in the two cell popula- 
tions (Fig. 1). Data from the series of seven 
samples (Fig. 2), consisting of more than 
43,000 expression- ratio measurements, 
were organized into a database to facilitate 
efficient exploration and analysis of the 
results. This database is publicly available 
on the Internet (13). 

During exponential growth in glucose- 
rich medium, the global pattern of gene 
expression was remarkably stable. Indeed, 
when gene expression patterns between the 
first two cell samples (harvested at a 2-hour 
interval) were compared, mRNA levels dif- 
fered by a factor of 2 or more for only 19 
genes (0.3%), and the largest of these dif- 
ferences was only 2.7-fold (14). However, as 
glucose was progressively depleted from the 
growth media during the course of the ex- 
periment, a marked change was seen in the 
global pattern of gene expression. mRNA 
levels for approximately 710 genes were 
induced by a factor of at least 2, and the 
mRNA levels for approximately 1030 genes 
declined by a factor of at least 2. Messenger 
RNA levels for 183 genes increased by a 
factor of at least 4, and mRNA levels for 
203 genes diminished by a factor of at least 
4- About half of these differentially ex- 
pressed genes have no currently recognized 
function and are not yet named. Indeed, 
more than 400 of the differentially ex- 
pressed genes have no apparent homology 
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-to-any-gene whose function is known"* f/5). 

The responses of these previously unchar- 
acterized genes to the diauxic shift therefore 
provides the first small clue to their possible 
roles. 

The global view of changes in expres- 
sion of genes with known functions pro- 
vides a vivid picture of the way in which 
the cell adapts to a changing environ- 
ment. Figure 3 shows a portion of the yeast 
metabolic pathways involved in carbon 
and energy metabolism. Mapping the 
changes we observed in the mRNAs en- 
coding each enzyme onto this framework 
allowed us to infer the redirection in the 
flow of metabolites through this system. 
We observed large inductions of the genes 
coding for the enzymes aldehyde dehydro- 
genase (ALD2) and acetyl-coenzyme 
A(CoA) synthase (ACS J), which func- 
tion together to convert the products of 
alcohol dehydrogenase into acetyi-CoA, 
which in turn is used to fuel the tricarbox- 
ylic acid (TCA) cycle and the glyoxylate 
cycle. The concomitant shutdown of tran- 
scription of the genes encoding pyruvate 
decarboxylase and induction of pyruvate 
carboxylase rechannels pyruvate away 
from acetaldehyde, and instead to oxalac- 
etate, where it can serve to supply the 
TCA cycle and gluconeogenesis. Induc- 
tion of the pivotal genes PCKl t encoding 
phosphoenolpyruvate carboxykinase, and 
FBP1, encoding fructose 1,6-biphos- 
phatase, switches the directions of two key 
irreversible steps in glycolysis, reversing 
the flow of metabolites along the revers- 
ible steps of the glycolytic pathway toward 
the essential biosynthetic precursor, glu- 
coses-phosphate. Induction of the genes 
coding for the trehalose synthase and gly- 
cogen synthase complexes promotes chan- 
neling of glucose-6-phosphate into these 
carbohydrate storage pathways. 

Just as the changes in expression of 
genes encoding pivotal enzymes can pro- 
vide insight into metabolic reprogram- 
ming, the behavior of large groups of func- 
tionally related genes can provide a broad 
view of the systematic way in which the 
yeast cell adapts to a changing environ- 
ment (Fig. 4). Several classes of genes, 
such as cytochrome c-related genes and 
those involved in the TCA/glyoxylate cy- 
cle and carbohydrate storage, were coord i- 
nately induced by glucose exhaustion. In 
contrast, genes devoted to protein synthe- 
sis, including ribosomal proteins, tRNA 
synthetases, and translation, elongation, 
and initiation factors, exhibited a coordi- 
nated decrease in expression. More than 
95% of ribosomal genes showed at least 
twofold decreases in expression during the 
diauxic shift (Fig. 4) (13). A noteworthy 
and illuminating exception was that the 



genes encoding mitochondrial ribosomal 
genes were generally induced rather than 
repressed after glucose limitation, high- 
lighting the requirement for mitchondrial 
biogenesis (13). As more is learned about 
the functions of every gene in the yeast 
genome, the ability to gain insight into a 
cell's response to a changing environment 
through its global gene expression patterns 
will become increasingly powerful. 

Several distinct temporal patterns of ex- 
pression could be recognized, and sets of 
genes could be grouped on the basis of the 
similarities in their expression patterns. The 
characterized members of each of these 
groups also shared important similarities in 
their functions. Moreover, in most cases, 
common regulatory mechanisms could be 
inferred for sets of genes with similar expres- 
sion profiles. For example, seven genes 
showed a late induction profile, with mRNA 
levels increasing by more than ninefold at 



thejast timepoint but less than threefold at 
the preceding timepoint (Fig. 5B). All of 
these genes were known to be glucose-re- 
pressed, and five of the seven were previously 
noted to share a common upstream activat- 
ing sequence (UAS), the carbon source re- 
sponse element (CSRE) (16-201 A search 
in the promoter regions of the remaining two 
f™> AC *1 ^d /DP2, revealed that 
ACR/, a gene essential for ACS] activity 
also possessed a consensus CSRE motif, but 
interestingly, 1DP2 did not. A search of the 
entire yeast genome sequence for the con- 
sensus CSRE motif revealed only four addi- 
tional candidate genes, none of which 
showed a similar induction. 

Examples from additional groups of 
genes that shared expression profiles are 
illustrated in Fig. 5, C through F. The 
sequences upstream of the named genes in 
Fig. 5C all contain stress response ele- 
ments (STRE), and with the exception 
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- - of H5P42~rhave previously been shown* to 
be controlled at least in part by these 
elements (21-24). Inspection of the se- 
quences upstream of HSP42 and the two 
uncharacterized genes shown in Fig. 5C, 
YKL026c, a hypothetical protein with 
similarity to glutathione peroxidase, and 
YGR043c, a putative transaldolase, re- 
vealed that each of these genes also pos- 
sess repeated upstream copies of the stress- 
responsive CCCCT motif. Of the 13 ad- 
ditional genes in the yeast genome that 
shared this expression profile [including 
HSP30, ALD2, OM45, and 10 uncharac- 
terized ORFs (25)], nine contained one or 
more recognizable STRE sites in their up- 
stream regions. 

The heterotrimeric transcriptional acti- 
vator complex HAP2 t 3,4 has been shown 
to be responsible for induction of several 
genes important for respiration (26-28). 
This complex binds a degenerate consensus 
sequence known as the CCAAT box (26). 
Computer analysis, using the consensus se- 
quence TNRYTGGB (29), has suggested 
that a large number of genes involved in 
respiration may be specific targets of 
HAP2,3,4 (30). Indeed, a putative 
HAP2,3,4 binding site could be found in 
the sequences upstream of each of the seven 
cytochrome c-related genes that showed 
the greatest magnitude of induction (Fig. 
5D). Of 12 additional cytochrome c-related 
genes that were induced, HAP2,3,4 binding 
sites were present in all but one. Signifi- 
cantly, we found that transcription of 
HAP4 itself was induced nearly ninefold 
concomitant with the diauxic shift. 

Control of ribosomal protein biogenesis 
is mainly exerted at the transcriptional 
level, through the presence of a common 
upstream-activating element (UAS^ ) 
that is recognized by the Rapl DNA-bin3- 
ing protein (31, 32). The expression pro- 
files of seven ribosomal proteins are shown 
in Fig. 5F. A search of the sequences 
upstream of all seven genes revealed con- 
sensus Rapl-binding motifs (33). It has 
been suggested that declining Rapl levels 
in the cell during starvation may be re- 
sponsible for the decline in ribosomal pro- 
tein gene expression (34). Indeed, we ob- 
served that the abundance of RAPl 
mRNA diminished by 4 4-fold, at about 
the time of glucose exhaustion. 

Of the 149 genes that encode known or 
putative transcription factors, only two, 
HAP4 and S/P4, were induced by a factor of 
more than threefold at the diauxic shift. 
S1P4 encodes a DNA-binding transcrip- 
tional activator that has been shown to 
interact with Snfl, the "master regulator" of 
glucose repression (35). The eightfold in- 
duction of S1P4 upon depletion of glucose 
strongly suggests a role in the induction of 



downstream genes at the diauxic*shift. 

Although most of the transcriptional 
responses that we observed were not pre- 
viously known, the responses of many 
genes during the diauxic shift have been 
described. Comparison of the results we 
obtained by DNA microarray hybridiza- 
tion with previously reported results there- 
fore provided a strong test of the sensitiv- 
ity and accuracy of this approach. The 
expression patterns we observed for previ- 
ously characterized genes showed almost 
perfect concordance with previously pub- 
lished results (36). Moreover, the differ- 
ential expression measurements obtained 
by DNA microarray hybridization were re- 
producible in duplicate experiments. For 
example, the remarkable changes in gene 
expression between cells harvested imme- 
diately after inoculation and immediately 
after the diauxic shift (the first and sixth 
intervals in this time series) were mea- 
sured in duplicate, independent DNA mi- 
croarray hybridizations. The correlation 
coefficient for two complete sets of expres- 
sion ratio measurements was 0.87, and for 
more than 95% of the genes, the expres- 



sion ratios measured in these duplicate 
experiments differed by less than a factor 
of 2. However, in a few cases, there were 
discrepancies between our results and pre- 
vious results, pointing to technical limita- 
tions that will need to be addressed as 
DNA microarray technology advances 
(37, 38). Despite the noted exceptions, 
the high concordance between the results 
we obtained in these experiments and 
those of previous studies provides confi- 
dence in the reliability and thoroughness 
of the survey. 

The changes in gene expression during 
this diauxic shift are complex and involve 
integration of many kinds of information 
about the nutritional and metabolic state 
of the cell. The large number of genes 
whose expression is altered and the diver- 
sity of temporal expression profiles ob- 
served in this experiment highlight the 
challenge of understanding the underlying 
regulatory mechanisms. One approach to 
defining the contributions of individual 
regulatory genes to a complex program of 
this kind is to use DNA microarrays to 
identify genes whose expression is affected 



Fig. 2. Trie section of the ar- 
ray indicated by the gray box 
in Fig. 1 is shown for each of 
the experiments described 
here. Representative genes 
are labeled. In each of the ar- 
rays used to analyze gene 
expression during the diauxic 
shift, red spots represent 
genes that were induced rel- 
ative to the initial timepoint, 
and green spots represent 
genes that were repressed 
relative to the initial timepoint. 
In the arrays used to analyze 
the effects of the top 7 A mu- 
tation and YAP1 overexpres- 
sion, red spots represent 
genes whose expression was 
increased, and green spots 
represent genes whose ex- 
pression was decreased by 
the genetic modification. Note 
that distinct sets of genes are 
induced and repressed in the 
different experiments. The 
complete images of each of 
these arrays can be viewed on 
the Internet {13). Cell density 
as measured by optical densi- 
ty (OD) at 600 nm was used to 
measure the growth of the 
culture. 
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by mutations in each putative regulatory 
gene. As a test of this strategy, we analyzed 
the genomewide changes in gene expression 
that result from deletion of the TUPl gene. 
Transcriptional repression of many genes by 
glucose requires the DNA-binding repressor 



Migl and is mediated by recruiting the tran- 
scriptional co-repressors Tupl and Cyc8/ 
Ssn6 (39). Tupl has also been implicated in 
repression of oxygen- regulated, mating-type- 
specific, and DNA-damage-inducible genes 
(40). 
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Fig. 3. Metabolic reprogramming inferred from global analysis of changes in gene expression. Onry key 
metabolic intermediates are identified. The yeast genes encoding the enzymes that catalyze each step 
in this metabolic circuit are identified by name in the boxes. The genes encoding succinyl-CoA synthase 
and glycogen-debranching enzyme have not been explicitly identified, but the ORFs YGR244 and 
YPR184 show significant homology to known succinyl-CoA synthase and glycogen-debranching en- 
zymes, respectively, and are therefore included in the corresponding steps in this figure. Red boxes with 
white lettering identify genes whose expression increases in the diauxic shift. Green boxes with dark 
green lettering identify genes whose expression diminishes in the diauxic shift. The magnitude of 
induction or repression is indicated for these genes. For multimeric enzyme complexes, such as 
succinate dehydrogenase, the indicated fold-induction represents an unweighted average of all the 
genes listed in the box. Black and white boxes indicate no significant differential expression (less than 
twofold). The direction of the arrows connecting reversible enzymatic steps indicate the direction of the 
flow of metabolic intermediates, inferred from the gene expression pattern, after the diauxic shift.' Arrows 
representing steps catalyzed by genes whose expression was strongly induced are highlighted in red. 
The broad gray arrows represent major increases in the flow of metabolites after the diauxic shift, 
inferred from the indicated changes in gene expression. 



Wild-type yeast cells and cells bearing 
a deletion of the TUP J gene (tupl A) were 
grown in parallel cultures in rich medium 
containing glucose as the carbon source. 
Messenger RNA was isolated from expo- 
nentially growing cells from the two pop- 
ulations and used to prepare cDNA la- 
beled with Cy3 (green) and Cy5 (red), 
respectively (J /). The labeled probes were 
mixed and simultaneously hybridized to 
the microarray. Red spots on the microar- 
ray therefore represented genes whose 
transcription was induced in the tup] A 
strain, and thus presumably repressed by 
Tupl (41). A representative section of the 
microarray (Fig. 2, bottom middle panel) 
illustrates that the genes whose expression 
was affected by the tupl A mutation, were, 
in general, distinct from those induced 
upon glucose exhaustion [complete images 
of all the arrays shown in Fig. 2 are avail- 
able on the Internet (J 3)]. Nevertheless, 
34 (10%) of the genes that were induced 
by a factor of at least 2 after the diauxic 
shift were similarly induced by deletion of 
TUPl , suggesting that these genes may be 
subject to TUP J -mediated repression by 
glucose. For example, SUC2, the gene en- 
coding invertase, and all five hexose trans- 
porter genes that were induced during the 
course of the diauxic shift were similarly 
induced, in duplicate experiments, by the 
deletion of TUPi. 

The set of genes affected by Tupl in this 
experiment also included a-glucosidases, 
the mating-type-specific genes MFAi and 
MFA2, and the DNA damage-indue ible 
RNR2 and RNR4, as well as genes involved 
in flocculation and many genes of unknown 
function. The hybridization signal corre- 
sponding to expression of TUP J itself was 
also severely reduced because of the (in- 
complete) deletion of the transcription unit 
in the tup J A strain, providing a positive 
control in the experiment (42). 

Many of the transcriptional targets of 
Tupl fell into sets of genes with related 
biochemical functions. For instance, al- 
though only about 3% of all yeast genes 
appeared to be TUP I -repressed by a factor 
of more than 2 in duplicate experiments 
under these conditions, 6 of the 13 genes 
that have been implicated in flocculation 
(15) showed a reproducible increase in 
expression of at least twofold when TUPl 
was deleted. Another group of related 
genes that appeared to be subject to TUPl 
repression encodes the serine-rich cell 
wall mannoproteins, such as Tipl and 
Tirl/Srpl which are induced by cold 
shock and other stresses (43), and similar, 
serine-poor proteins, the seripauperins 
(44). Messenger RNA levels for 23 of the 
26 genes in this group were reproducibly 
elevated by at least 2.5-fold in the tuplA 
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— strawy and- 1 8 of these genes -were indofced 
by more than sevenfold when TUPl was 
deleted. In contrast, none of 83 genes that 
could be classified as putative regulators of 
the cell division cycle were induced more 
than twofold by deletion of TUPL Thus, 
despite the diversity of the regulatory sys- 
tems that employ Tupl, most of the genes 
that it regulates under these conditions 
fall into a limited number of distinct func- 
tional classes. 

Because the microarray allows us to 
monitor expression of nearly every gene in 
yeast, we can, in principle, use this ap- 
proach to identify all the transcriptional 
targets of a regulatory protein like Tupl. It 
is important to note, however, that in any 
single experiment of this kind we can only 
recognize those target genes that are nor- 
mally repressed (or induced) under the 
conditions of the experiment. For in- 
stance, the experiment described here an- 
alyzed a MAT a strain in which MFAl 
and MFA2, the genes encoding the a- 
factor mating pheromone precursor, are 
normally repressed. In the isogenic tup J A 
strain, these genes were inappropriately 
expressed, reflecting the rote that Tupl 
plays in their repression. Had we instead 
carried out this experiment with a MATA 
strain (in which expression of MFAl and 
MFA2 is not repressed), it would not have 
been possible to conclude anything re- 
garding the role of Tupl in the repression 
of these genes. Conversely, we cannot dis- 
tinguish indirect effects of the chronic 
absence of Tupl in the mutant strain from 
effects directly attributable to its partici- 
pation in repressing the transcription of a 
gene. 

Another simple route to modulating the 
activity of a regulatory factor is to overex- 
press the gene that encodes it. YAP! en- 
codes a DNA-binding transcription factor 
belonging to the b-zip class of DNA-bind- 
ing proteins. Overexpression of YAPl in 
yeast confers increased resistance to hydro- 
gen peroxide, o-phenanthroline, heavy 
metals, and osmotic stress (45). We ana- 
lyzed differential gene expression between a 
wild-type strain bearing a control plasmid 
and a strain with a plasmid expressing YAP J 
under the control of the strong GAL MO 
promoter, both grown in galactose (that is, 
a condition that induces YAPl overexpres-' 
sion). Complementary DNA from the con- 
trol and YAPl overexpressing strains la- 
beled with Cy3 and Cy5, respectively, was 
prepared from mRNA isolated from the two 
strains and hybridized to the microarray. 
Thus, red spots on the array represent genes 
that were induced in the strain overexpress- 
ing YAPl. 

Of the 17 genes whose mRNA levels 
increased by more than threefold when 



YAPl was overexpressed in this way, five 
bear homology to aryl-alcohol oxidoreduc- 
tases (Fig. 2 and Table I). An additional 
four of the genes in this set also belong to 
the general class of dehydrogenases/oxi- 
doreductases. Very little is known about 
the role of aryl-alcohol oxidoreductases in 
S. cerevisiae, but these enzymes have been 
isolated from ligninolytic fungi, in which 
they participate in coupled redox reac- 
tions, oxidizing aromatic, and aliphatic 
unsaturated alcohols to aldehydes with the 
production of hydrogen peroxide (46, 47). 
The fact that a remarkable fraction of the 
targets identified in this experiment be- 
long to the same small, functional group of 
oxidoreductases suggests that these genes 



Fig. 4. Coordinated reg- 
ulation of functionally re- 
lated genes. The curves 
represent the average in- 
duction or repression ra- 
tios for all the genes in 
each indicated group. 
The total number of 
genes in each group was 
as follows: ribosoma! 
proteins, 112; translation 
elongation and initiation 
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rnight_play an important protective role 
during oxidative stress. Transcription of a 
small number of genes was reduced in the 
strain overexpressing Yapl. Interestingly, 
many of these genes encode sugar per- 
meases or enzymes involved in inositol 
metabolism. 

mfr^??** f ° r ^-^nding sites 
(TTACTAA or TCACTAA) in the se- 
quences upstream of the target genes we 
identified (48). About two-thirds of the 
genes that were induced by more than 
threefold upon Yapl overexpression had 
one or more binding sites within 600 bases 
upstream of the start codon (Table 1), sug- 
gestingthat they are directly regulated by 
rapl. The absence of canonical Yapl-bind- 
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- -ing-fiites-tipstream of the others may reflect 
an ability of Yapl to bind sites that differ 
from the canonical binding sites, perhaps in 
cooperation with other factors, or less like- 
ly, may represent an indirect effect of Yapl 
overexpression, mediated by one or more 
intermediary factors. Yapl sites were found 
only four times in the corresponding region 
of an arbitrary set of 30 genes that were not 
differentially regulated by Yapl. 

Use of a DNA microarray to character- 
ize the transcriptional consequences of 
mutations affecting the activity of regula- 
tory molecules provides a simple and pow- 
erful approach to dissection and character- 
ization of regulatory pathways and net- 



REPORTS 



works. This strategy also has an important 
practical application in drug screening. 
Mutations in specific genes encoding can- 
didate drug targets can serve as surrogates 
for the ideal chemical inhibitor or modu- 
lator of their activity. DNA microarrays 
can be used to define the resulting signa- 
ture pattern of alterations in gene expres- 
sion, and then subsequently used in an 
assay to screen for compounds that repro- 
duce the desired signature pattern. 

DNA microarrays provide a simple and 
economical way to explore gene expres- 
sion patterns on a genomic scale. The 
hurdles to extending this approach to any 
other organism are minor. The equipment 
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required for fabricating and using DNA 
microarrays (9) consists of components 
that were chosen for their modest cost and 
simplicity. It was feasible for a small group 
to accomplish the amplification of more 
than 6000 genes in about 4 months and, 
once the amplified gene sequences were in 
hand, only 2 days were required to print a 
set of 110 microarrays of 6400 elements 
each. Probe preparation, hybridization, 
and fluorescent imaging are also simple 
procedures. Even conceptually simple ex- 
periments, as we described here, can yield 
vast amounts of information. The value of 
the information from each experiment of 
this kind will progressively increase as 
more is learned about the functions of 
each gene and as additional experiments 
define the global changes in gene expres- 
sion in diverse other natural processes and 
genetic perturbations. Perhaps the greatest 
challenge now is to develop efficient 
methods for organizing, distributing, inter- 
preting, and extracting insights from the 
large volumes of data these experiments 
will provide. 
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ABSTRACT The recent ability to sequence whole genomes 
allows ready access to all genetic material. The approaches 
outlined here allow automated analysis of sequence for the 
synthesis of optimal primers in an automated multiplex 
oligonucleotide synthesizer (AMOS). The efficiency is such 
that all ORFs for an organism can be amplified by PCR. The 
resulting amplicons can be used directly in the construction of 
DNA arrays or can be cloned for a large variety of functional 
analyses. These tools allow a replacement of single-gene 
analysis with a highly efficient whole-genome analysis. 



The genome sequencing projects have generated and will 
continue to generate enormous amounts of sequence data. The 
genomes of Saccharomyces cerevisiae, Escherichia coli, Hae- 
mophilus influenzae (1), Mycoplasma genitalium (2), and Meth- 
anococcus jannaschii (3) have been completely sequenced. 
Other model organisms have had substantial portions of their 
genomes sequenced as well, including the nematode Caeno- 
rhabditis elegans (4) and the small flowering plant Arabidopsis 
thaliana (5). This massive and increasing amount of sequence 
information allows the development of novel experimental 
approaches to identify gene function. 

One standard use of genome sequence data is to attempt to 
identify the functions of predicted open reading frames 
(ORFs) within the genome by comparison to genes of known 
function. Such a comparative analysis of all ORFs to existing 
sequence data is fast, simple, and requires no experimentation 
and is therefore a reasonable first step. While finding sequence 
homologies/motifs is not a substitute for experimentation, 
noting the presence of sequence homology and/or sequence 
motifs can be a useful first step in finding interesting genes, in 
designing experiments and, in some cases, predicting function. 
However, this type of analysis is frequently un informative. For 
example, over one-half of new ORFs in S. cerevisiae have no 
known function (6). If this is the case in a well studied organism 
such as yeast, the problem will be even worse in organisms that 
are less well studied or less manipulable. A large, experimen- 
tally determined gene function database would make homol- 
ogy/motif searches much more useful. 

Experimental analysis must be performed to thoroughly 
understand the biological function of a gene product. Scaling 
up from classical "cottage industry" one-gene-oriented ap- 
proaches to whole-genome analysis would be very expensive 
and laborious. It is clear that novel strategies are necessary to 
efficiently pursue the next phase of the genome projects — 
whole-genome experimental analysis to explore gene expres- 
sion, gene product function, and other genome functions. 
Model organisms, such as S. cerevisiae, will be extremely 



The publication costs of this article were defrayed in part by page charge 
payment. This article must therefore be hereby marked "advertisement" in 
accordance with 18 U.S.C. §1734 solely to indicate this fact. 

© 1997 by The National Academy of Sciences 0027-8424/97/948945-3$2.00/O 
PNAS is available online at http://www.pnas.org. 



important in the development of novel whole-genome analysis 
techniques and, subsequently, in improving our understanding 
of other more complex and less manipulable organisms. 

The genome sequence can be systematically used as a tool 
to understand ORFs, gene product function, and other ge- 
nome regions. Toward this end, a directed strategy has been 
developed for exploiting sequence information as a means of 
providing information about biological function (Fig. 1). Ef- 
forts have been directed toward the amplification of each 
predicted ORF or any other region of the genome ranging 
from a few base pairs to several kilobase pairs. There are many 
uses for these amplicons — they can be cloned into standard 
vectors or specialized expression vectors, or can be cloned into 
other specialized vectors such as those used for two-hybrid 
analysis. The amplicons can also be used directly by, for 
example, arraying onto glass for expression analysis, for DNA 
binding assays, or for any direct DNA assay (7). As a pilot 
study, synthetic primers were made on the 96-well automated 
multiplex oligonucleotide synthesizer (AMOS) instrument (8) 
(Fig. 2). These oligonucleotides were used to amplify each 
ORF on yeast chromosome V. The current version of this 
instrument can synthesize three plates of 96 oligonucleotides 
each (25 bases) in an 8-hr day. The amplification of the entire 
set of PCR products was then analyzed by gel electrophoresis 
(Fig. 3). Successful amplification of the proper length product 
on the first attempt was 95%. This project demonstrates that 
one can go directly from sequence information to biological 
analysis in a truly automated, totally directed manner. 

These amplicons can be incorporated directly in arrays or 
the amplicons can be cloned. If the amplicons are to be cloned, 
novel sequences can be incorporated at the 5' end of the 
oligonucleotide to facilitate cloning. One potential problem 
with cloning PCR products is that the cloned amplicons may 
contain sequence alterations that diminish their utility. One 
option would be to resequence each* individual amplicon. 
However, this is expensive, inefficient, and time consuming. A 
faster, more cost-effective, and more accurate approach is to 
apply comparative sequencing by denaturing HPLC (9). This 
method is capable of detecting a single base change in a 2-kb 
heteroduplex. Longer amplicons can be analyzed by use of 
appropriate restriction fragments. If any change is detected in 
a clone, an alternate clone of the same region can be analyzed. 
Modifying the system to allow high throughput analysis by 
denaturing HPLC is also relatively simple and straightforward. 

If amplicons are used directly on arrays without cloning, it 
is important to note that, even if single PCR product bands are 
observed on gels, the PCR products will be contaminated with 
various amounts of other sequences. This contamination has 
the potential to affect the results in, for example, expression 
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Fig. 1. Overview of systematic method for isolating individual 
genes. Sequence information is obtained automatically from sequence 
databases. The data are input into primer selection software specifi- 
cally designed to target ORFs as designated by database annotations. 
The output file containing the primer information is directly read by 
a high-throughput oligonucleotide synthesizer, which makes the oli- 
gonucleotides in 96-well plates (AMOS, automated multiplex oligo- 
nucleotide synthesizer). The forward and reverse primers are synthe- 
sized in the same location on separate plates to facilitate the down- 
stream handling of primers. The amplicons are generated by PCR in 
96-well plates as well. 

analysis. On the other hand, direct use of the amplicons is 
much less labor intensive and greatly decreases the occurrence 
of mistakes in clone identification, a ubiquitous problem 
associated with large clone set archiving and retrieving. 

Any large-scale effort to capture each ORF within a genome 
must rely on automation if cost is to be minimized while 
efficiency is maximized. Toward that end, primers targeting 
ORFs were designed automatically using simple new scripts 
and existing primer selection software. These script-selected 
primer sequences were directly read by the high-throughput 
synthesizer and the forward and reverse primers were synthe- 
sized in separate plates in corresponding wells to facilitate 
automated pipetting and PCR amplifications. Each of the 
resulting PCR products, generated with minimum labor, con- 
tains a known, unique ORF. 

Large-scale genome analysis projects are dependent on 
newly emerging technologies to make the studies practical and 
economically feasible. For example, the cost of the primers, a 
significant issue in the past, has been reduced dramatically to 
make feasible this and other projects that require tens of 
thousands of oligonucleotides. Other methods of high- 
throughput analysis are also vital to the success of functional 
analysis projects, such as microarraying and oligonucleotide 
chip methods (10-14). 

Changes in attitude are also required. One of the major costs 
of commercial oligonucleotides is extensive quality control 
such that virtually 100% of the supplied oligonucleotides are 
successfully synthesized and work for their intended purpose. 
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Fig. 2. Overall approach for using database of a genome to direct 
biological analysis. The synthesis of the 6,000 ORFs (orfs) for each 
gene of S. cerevisiae can be used in many applications utilizing both 
cloning and microarraying technology. 

Considerable cost reduction can be obtained by simply de- 
creasing the expected successful synthesis rate to 95-97%. One 
can then achieve faster and cheaper whole genome coverage by 
simply adding a single quality control at the end of the 
experiment and batching the failures for resynthesis. 

The directed nature of the amplicon approach is of clear 
advantage. The sequence of each ORF is analyzed automati- 
cally, and unique specific primers are made to target each 
ORJF. Thus, there is relatively little time or labor involved — for 
example, no random cloning and subsequent screening is 
required because each product is known. In the test system, 
primers for 240 ORFs from chromosome V were systematically 
synthesized, beginning from the left arm and continuing 
through to the right arm. At no point was there any manual 
analysis of sequence information to generate the collection. In 
many ways, now that the sequence is known, there is no need 
for the researcher to examine it. 

These amplicons can be arrayed and expression analysis can 
be done on all arrayed ORFs with a single hybridization (10). 
Those ORFs that display significant differential expression 
patterns under a given selection are easily identified without 
the laborious task of searching for and then sequencing a clone. 
Once scaled up, the procedure provides even greater returns 
on effort, because a single hybridization will ultimately provide 
a "snapshot" of the expression of all genes in the yeast genome. 
Thus, the limiting factor in whole genome analysis will not be 
the analysis process itself, but will instead be the ability of 
researchers to design and carry out experimental selections. 

Current expression and genetic analysis technologies are 
geared toward the analysis of single genes and are ill suited to 
analyze numerous genes under many conditions. Additional 
difficulties with current technologies include: the effort and 
expense required to analyze expression and make mutants, the 
potential duplication of effort if done by different laboratories, 
and the possibility of conflicting results obtained from differ- 
ent laboratories. In contrast, whole genome analysis not only 
is more efficient, it also provides data of much higher quality; 
all genes are assayed and compared in parallel under exactly 
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Fig. 3. Gel image of amplifications. Using the method described in Fig. 1, amplicons were generated for ORFs of S. cerevisiae chromosome 
V. One plate of 96 amplification reactions is shown. 



the same conditions. In addition, amplicons have many appli- 
cations beyond gene expression. For example, one recent 
approach is to incorporate a unique DNA sequence tag, 
synthesized as part of each gene specific primer, during 
amplification. The tags or molecular bar codes, when reintro- 
duced into the organism as a gene deletion or as a gene clone, 
can be used much more efficiently than individual mutations 
or clones because pools of tagged mutants or transformants 
can be analyzed in parallel. This parallel analysis is possible 
because the tags are readily and quantitatively amplified even 
in complex mixtures of tags (13). 

These ORF genome arrays and oligonucleotide tagged 
libraries can be used for many applications. Any conventional 
selection applied to a library that gives discrete or multiple 
products can use these technologies for a simple direct read- 
out. These include screens and selections for mutant comple- 
mentation, overexpression suppression (15, 16), second-site 
suppressors, synthetic lethality, drug target overexpression 
(17), two-hybrid screens (18), genome mismatch scanning (19), 
or recombination mapping. 

The genome projects have provided researchers with a vast 
amount of information. These data must be used efficiently 
and systematically to gain a truly comprehensive understand- 
ing of gene function and, more broadly, of the entire genome 
which can then be applied to other organisms. Such global 
approaches are essential if we are to gain an understanding of 
the living cell. This understanding should come from the 
viewpoint of the integration of complex regulatory networks, 
the individual roles and interactions of thousands of functional 
gene products, and the effect of environmental changes on 
both gene regulatory networks and the roles of all gene 
products. The time has come to switch from the analysis of a 
single gene to the analysis of the whole genome. 

Support was provided by National Institutes of Health Grants 
R37H60198 and P01H600205. 
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INTRODUCTION 

Technological advancements combined with in- 
tensive DNA sequencing efforts have generated an 
enormous database of sequence information over the 
past decade. To date, more than 3 million sequences, 
totaling over 2.2 billion bases [1], are contained 
within the GenBank database, which includes the 
complete sequences of 19 different organisms [2]. The 
first complete sequence of a free-living organism, 
Haemophilus influenzae, was reported in 1995 [3] and 
was followed shortly thereafter by the first complete 
sequence of a eukaryote, Saccharomyces cervisiae [4]. 
The development of dramatically improved sequenc- 
ing methodologies promises that complete elucida- 
tion of the Homo sapiens DNA sequence is not far 
behind [5]. 

To exploit more fully the wealth of new sequence 
information, it was necessary to develop novel meth- 
ods for the high-throughput or parallel monitoring 
of gene expression. Established methods such as 
northern blotting, RNAse protection assays, SI nu- 
clease analysis, plaque hybridization, and slot blots 
do not provide sufficient throughput to effectively 
utilize the new genomics resources. Newer methods 
such as differential display [6], high-density filter 
hybridization [7,8], serial analysis of gene expression 
[9], and cDNA- and oligonucleotide-based microarray 
"chip" hybridization [10-12] are possible solutions 
to this bottleneck. It is our belief that the microarray 
approach, which allows the monitoring of expres- 
sion levels of thousands of genes simultaneously, is 
a tool of unprecedented power for use in toxicology 
studies. 



Almost without exception, gene expression is al- 
tered during toxicity, as either a direct or indirect 
result of toxicant exposure. The challenge facing 
toxicologists is to define, under a given set of ex- 
perimental conditions, the characteristic and spe- 
cific pattern of gene expression elicited by a given 
toxicant. Microarray technology offers an ideal plat- 
form for this type of analysis and could be the foun- 
dation for a fundamentally new approach to 
toxicology testing. 

MICROARRAY DEVELOPMENT AND APPLICATIONS 

cDNA Microarrays 

In the past several years, numerous systems were 
developed for the construction of large-scale DNA 
arrays. All of these platforms are based on cDNAs 
or oligonucleotides immobilized to a solid sup- 
port. In the cDNA approach, cDNA (or genomic) 
clones of interest are arrayed in a multi-well for- 
mat and amplified by polymerase chain reaction. 
The products of this amplification, which are usu- 
ally 500- to 2000-bp clones from the 3' regions of 
the genes of interest, are then spotted onto solid 
support by using high-speed robotics. By using 
this method, microarrays of up to 10 000 clones 
can be generated by spotting onto a glass substrate 
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[13,14]. Sample detection for microarrays on glass 
involves the use of probes labeled with fluores- 
cent or radioactive nucleotides. 

Fluorescent cDNA probes are generated from con- 
trol and test RNA samples in single-round reverse-tran- 
scription reactions in the presence of fluorescently 
tagged dUTP (e.g., Cy3-dUTP and Cy5-dUTP), which 
produces control and test products labeled with dif- 
ferent fluors. The cDNAs generated from these two 
populations, collectively termed the "probe," are then 
mixed and hybridized to the array under a glass cov- 
erslip [10,11,15]. The fluorescent signal is detected 
by using a custom-designed scanning confocal mi- 
croscope equipped with a motorized stage and lasers 
for fluor excitation [10,11,15]. The data are analyzed 
with custom digital image analysis software that de- 
termines for each DNA feature the ratio of fluor 1 to 
fluor 2, corrected for local background [16,17]. The 
strength of this approach lies in the ability to label 
RNAs from control and treated samples with differ- 
ent fluorescent nucleotides, allowing for the simul- 
taneous hybridization and detection of both 
populations on one microarray. This method elimi- 
nates the need to control for hybridization between 
arrays. The research groups of Drs. Patrick Brown and 
Ron Davis at Stanford University spearheaded the 
effort to develop this approach, which has been suc- 
cessfully applied to studies of Arabidopsis thaliana 
RNA [10], yeast genomic DNA [15], tumorigenic ver- 
sus non-tumorigenic human tumor cell lines [11], 
human T-cells [18], yeast RNA [19], and human in- 
flammatory disease-related genes [20]. The most dra- 
matic result of this effort was the first published 
account of gene expression of an entire genome, that 
of the yeast Saccharomyces cervisiae [21]. 

In an alternative approach, large numbers of cDNA 
clones can be spotted onto a membrane support, al- 
beit at a lower density [7,22]. This method is useful 
for expression profiling and large-scale screening and 
mapping of genomic or cDNA clones [7,22-24]. In 
expression profiling on filter membranes, two dif- 
ferent membranes are used simultaneously for con- 
trol and test RNA hybridizations, or a single 
membrane is stripped and reprobed. The signal is 
detected by using radioactive nucleotides and visu- 
alized by phosphorimager analysis or autoradiogra- 
phy. Numerous companies now sell such cDNA 
membranes and software to analyze the image data 
[25-27]. 

Oligonucleotide Microarrays 

Oligonucleotide microarrays are constructed either 
by spotting prefabricated oligos on a glass support 
[13] or by the more elegant method of direct in situ 
oligo synthesis on the glass surface by photolithog- 
raphy [28-30]. The strength of this approach lies in 
its ability to discriminate DNA molecules based on 
single base-pair difference. This allows the applica- 
tion of this method to the fields of medical diagnos- 



tics, pharmacogenetics, and sequencing by hybrid- 
ization as well as gene-expression analysis. 

Fabrication of oligonucleotide chips by photoli- 
thography is theoretically simple but technically 
complex [29,30]. The light from a high-intensity 
mercury lamp is directed through a photolitho- 
graphic mask onto the silica surface, resulting in 
deprotection of the terminal nucleotides in the illu- 
minated regions. The entire chip is then reacted with 
the desired free nucleotide, resulting in selected chain 
elongation. This process requires only 4n cycles 
(where n = oligonucleotide length in bases) to syn- 
thesize a vast number of unique oligos, the total num- 
ber of which is limited only by the complexity of the 
photolithographic mask and the chip size [29,31,32]. 

Sample preparation involves the generation of 
double-stranded cDNA from cellular poly(A)+ RNA 
followed by antisense RNA synthesis in an in vitro 
transcription reaction with biotinylated or fluor- 
tagged nucleotides. The RNA probe is then frag- 
mented to facilitate hybridization. If the indirect 
visualization method is used, the chips are incubated 
with fluor-linked streptavidin (e.g., phycoerythrin) 
after hybridization [12,33]. The signal is detected with 
a custom confocal scanner [34]. This method has 
been applied successfully to the mapping of genomic 
library clones [35], to de novo sequencing by hybrid- 
ization [28,36], and to evolutionary sequence com- 
parison of the BRCA1 gene [37]. In addition, 
mutations in the cystic fibrosis [38] and BRCA1 [39] 
gene products and polymorphisms in the human im- 
munodeficiency virus- 1 clade B protease gene [40] 
have been detected by this method. Oligonucleotide 
chips are also useful for expression monitoring [33] 
as has been demonstrated by the simultaneous evalu- 
ation of gene-expression patterns in nearly all open 
reading frames of the yeast strain S. cerevisiae [12]. 
More recently, oligonucleotide chips have been used 
to help identify single nucleotide polymorphisms in 
the human [41] and yeast [42] genomes. 

THE USE OF MICROARRAYS IN TOXICOLOGY 

Screening for Mechanism of Action 

The field of toxicology uses numerous in vivo 
model systems, including the rat, mouse, and rab- 
bit, to assess potential toxicity and these bioassays 
are the mainstay of toxicology testing. However, in 
the past several decades, a plethora of in vitro tech- 
niques have been developed to measure toxicity, 
many of which measure toxicant-induced DNA dam- 
age. Examples of these assays include the Ames test, 
the Syrian hamster embryo cell transformation as- 
say, micronucleus assays, measurements of sister 
chromatid exchange and unscheduled DNA synthe- 
sis, and many others. Fundamental to all of these 
methods is the fact that toxicity is often preceded 
by, and results in, alterations in gene expression. In 
many cases, these changes in gene expression are a 
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far more sensitive, characteristic, and measurable 
endpoint than the toxicity itself. We therefore pro- 
pose that a method based on measurements of the 
genome-wide gene expression pattern of an organ- 
ism after toxicant exposure is fundamentally infor- 
mative and complements the established methods 
described above. 

We are developing a method by which toxicants 
can be identified and their putative mechanisms of 
action determined by using toxicant-induced gene ex- 
pression profiles. In this method, in one or more de- 
fined model systems, dose and time-course parameters 
are established for a series of toxicants within a given 
prototypic class (e.g., polycyclic aromatic hydrocar- 
bons (PAHs)). Cells are then treated with these agents 
at a fixed toxicity level (as measured by cell survival), 
RNA is harvested, and toxicant-induced gene expres- 
sion changes are assessed by hybridization to a cDNA 
microarray chip (Figure 1). We have developed a cus- 
tom DNA chip, called ToxChip vl.O, specifically for 
this purpose and will discuss it in more detail below. 
The changes in gene expression induced by the test 
agents in the model systems are analyzed, and the 
common set of changes unique to that class of toxi- 
cants, termed a toxicant signature, is determined. 

This signature is derived by ranking across all ex- 
periments the gene-expression data based on rela- 

Control 
Population 



tive fold induction or suppression of genes in treated 
samples versus untreated controls and selecting the 
most consistently different signals across the sample 
set. A different signature may be established for each 
prototypic toxicant class. Once the signatures are de- 
termined, gene-expression profiles induced by un- 
known agents in these same model systems can then 
be compared with the established signatures. A match 
assigns a putative mechanism of action to the test 
compound. Figure 2 illustrates this signature method 
for different types of oxidant stressors, PAHs, and 
peroxisome proliferators. In this example, the un- 
known compound in question had a gene-expres- 
sion profile similar to that of the oxidant stressors in 
the database. We anticipate that this general method 
will also reveal cross talk between different pathways 
induced by a single agent (e.g., reveal that a com- 
pound has both PAH-like and oxidant-like proper- 
ties). In the future, it may be necessary to distinguish 
very subtle differences between compounds within 
a very large sample set (e.g., thousands of highly simi- 
lar structural isomers in a combinatorial chemistry 
library or peptide library). To generate these highly 
refined signatures, standard statistical clustering tech- 
niques or principal-component analysis can be used. 

For the studies outlined in Figure 2, we developed 
the custom cDNA microarray chip ToxChip vl.O. 
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Figure 1. Simplified overview of the method for sample trative purposes, samples derived from cell culture are depicted, 
preparation and hybridization to cDNA microarrays. For illus- although other sample types are amenable to this analysis. 
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Figure 2. Schematic representation of the method for iden- 
tification of a toxicant's mechanism of action. In this method, 
gene-expression data derived from exposure of mode! sys- 
tems to known toxicants are analyzed, and a set of changes 
characteristic to that type of toxicant (termed the toxicant 
signature) is identified. As depicted, oxidant stressors produce 
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consistent changes in group A genes (indicated by red and 
green circles), but not group B or C genes (indicated by gray 
circles). The set of gene-expression changes elicited by the 
suspected toxicant is then compared with these characteristic 
patterns, and a putative mechanism of action is assigned to 
the unknown agent. 



The 2090 human genes that comprise this subarray 
were selected for their well-documented involve- 
ment in basic cellular processes as well as their re- 
sponses to different types of toxic insult. Included 
on this list are DNA replication and repair genes, 
apoptosis genes, and genes responsive to PAHs and 
dioxin-like compounds, peroxisome proliferators, 
estrogenic compounds, and oxidant stress. Some of 
the other categories of genes include transcription 
factors, oncogenes, tumor suppressor genes, cyclins, 
kinases, phosphatases, cell adhesion and motility 
genes, and homeobox genes. Also included in this 
group are 84 housekeeping genes, whose hybridiza- 
tion intensity is averaged and used for signal nor- 
malization of the other genes on the chip. To date, 
very few toxicants have been shown to have appre- 
ciable effects on the expression of these housekeep- 
ing genes. However, this housekeeping list will be 
revised if new data warrant the addition or deletion 
of a particular gene. Table 1 contains a general de- 
scription of some of the different classes of genes 
that comprise ToxChip vl.O. 

When a toxicant signature is determined, the 
genes within this signature are flagged within the 
database. When uncharacterized toxicants are then 
screened, the data can be quickly reformatted so that 
blocks of genes representing the different signatures 



are displayed [11]. This facilitates rapid, visual in- 
terpretation of data. We are also developing Tox- 
Chip v2.0 and chips for other model systems, 
including rat, mouse, Xenopus, and yeast, for use in 
toxicology studies. 

Animal Models in Toxicology Testing 

The toxicology community relies heavily on the 
use of animals as model systems for toxicology test- 
ing. Unfortunately, these assays are inherently ex- 
pensive, require large numbers of animals and take a 
long time to complete and analyze. Therefore, the 
National Institute of Environmental Health Sciences 
(NIEHS), the National Toxicology Program, and the 
toxicology community at large are committed to re- 
ducing the number of animals used, by developing 
more efficient and alternative testing methodologies. 
Although substantial progress has been made in the 
development of alternative methods, bioassays are 
still used for testing endpoints such as neurotoxic- 
ity, immunotoxicity, reproductive and developmen- 
tal toxicology, and genetic toxicology. The rodent 
cancer bioassay is a particularly expensive and time- 
consuming assay, as it requires almost 4 yr, 1200 
animals, and millions of dollars to execute and ana- 
lyze [43]. In vitro experiments of the type outlined 
in Figure 2 might provide evidence that an unknown 
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Table 1. ToxChip v1.0: A Human cDNA Microarray 
Chip Designed to Detect Responses to Toxic Insult 

No. of genes 



Gene category on chip 



Apoptosis 72 
DNA replication and repair 99 
Oxidative stress/redox homeostasis 90 
Peroxisome proliferator responsive 22 
Dioxin/PAH responsive 12 
Estrogen responsive 63 
Housekeeping 84 
Oncogenes and tumor suppressor genes 76 
Cell-cycle control 51 

Transcription factors 131 

Kinases , 276 
Phosphatases 88 
Heat-shock proteins 23 

Receptors 349 

Cytochrome P450s 30_ 



*This list is intended as a general guide. The gene categories are not 
unique, and some genes are listed in multiple categories. 

agent is (or is not) responsible for eliciting a given 
biological response. This information would help to 
select a bioassay more specifically suited to the agent 
in question or perhaps suggest that a bioassay is not 
necessary, which would dramatically reduce cost, 
animal use, and time. 

The addition of microarray techniques to stan- 
dard bioassays may dramatically enhance the sen- 
sitivity and interpretability of the bioassay and 
possibly reduce its cost. Gene-expression signatures 
could be determined for various types of tissue-spe- 
cific toxicants, and new compounds could be 
screened for these characteristic signatures, provid- 
ing a rapid and sensitive in vivo test. Also, because 
gene expression is often exquisitely sensitive to low 
doses of a toxicant, the combination of gene-expres- 
sion screening and the bioassay might allow the use 
of lower toxicant doses, which are more relevant to 
human exposure levels, and the use of fewer ani- 
mals. In addition, gene-expression changes are nor- 
mally measured in hours or days, not in the months 
to years required for tumor development. Further- 
more, microarrays might be particularly useful for 
investigating the relationship between acute and 
chronic toxicity and identifying secondary effects 
of a given toxicant by studying the relationship 
between the duration of exposure to a toxicant and 
the gene-expression profile produced. Thus, a bio- 
assay that incorporates gene-expression signatures 
with traditional endpoints might be substantially 
shorter, use more realistic dose regimens, and cost 
substantially less than the current assays do. 

These considerations are also relevant for branches 
of toxicology not related to human health and not 
using rodents as model systems, such as aquatic toxi- 
cology and plant pathology. Bioassays based on the 
flathead minnow, Daphnia, and Arabadopsis could 



also be improved by the addition of microarray analy- 
sis. The combination of microarrays with traditional 
bioassays might also be useful for investigating some 
of the more intractable problems in toxicology re- 
search, such as the effects of complex mixtures and 
the difficulties in cross-species extrapolation. 

Exposure Assessment, Environmental Monitoring, 
and Drug Safety 

The currently used methods for assessment of ex- 
posure to chemical toxicants are based on measure- 
ment of tissue toxin levels or on surrogate markers 
of toxicity, termed biomarkers (e.g., peripheral blood 
levels of hepatic enzymes or DNA adducts). Because 
gene expression is a sensitive endpoint, gene expres- 
sion as measured with microarray technology may 
be useful as a new biomarker to more precisely iden- 
tify hazards and to assess exposure. Similarly, 
microarrays could be used in an environmental- 
monitoring capacity to measure the effect of poten- 
tial contaminants on the gene-expression profiles 
of resident organisms. In an analogous fashion, 
microarrays could be used to measure gene-expres- 
sion endpoints in subjects in clinical trials. The com- 
bination of these gene-expression data and more 
established toxic endpoints in these trials could be 
used to define highly precise surrogates of safety. 

Gene-expression profiles in samples from exposed 
individuals could be compared to the profiles of the 
same individuals before exposure. From this infor- 
mation, the nature of the toxic exposure can be de- 
termined or a relative clinical safety factor estimated. 
In the future it may also be possible to estimate not 
only the nature but the dose of the toxicant for a 
given exposure, based on relative gene-expression 
levels. This general approach may be particularly 
appropriate for occupational-health applications, in 
which unexposed and exposed samples from the 
same individuals may be obtainable. For example, 
a pilot study of gene expression in peripheral-blood 
lymphocytes of Polish coke-oven workers exposed 
to PAHs (and many other compounds) is under con- 
sideration at the NIEHS. An important consideration 
for these types of studies is that gene expression can 
be affected by numerous factors, including diet, 
health, and personal habits. To reduce the effects 
of these confounding factors, it may be necessary 
to compare pools of control samples with pools of 
treated samples. In the future it may be possible to 
compare exposed sample sets to a national database 
of human-expression data, thus eliminating the 
need to provide an unexposed sample from the same 
individual. Efforts to develop such a national gene- 
expression database are currently under way [44,45]. 
However, this national database approach will re- 
quire a better understanding of genome-wide gene 
expression across the highly diverse human popu- 
lation and of the effects of environmental factors 
on this expression. 
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Alleles, Oligo Arrays, and Toxicogenetics 

Gene sequences vary between individuals, and 
this variability can be a causative factor in human 
diseases of environmental origin [46,47]. A new area 
of toxicology, termed toxicogenetics, was recently 
developed to study the relationship between genetic 
variability and toxicant susceptibility. This field is 
not the subject of this discussion, but it is worth- 
while to note that the ability of oligonucleotide ar- 
rays to discriminate DNA molecules based on single 
base-pair differences makes these arrays uniquely 
useful for this type of analysis. Recent reports dem- 
onstrated the feasibility of this approach [41,42]. 
The NIEHS has initiated the Environmental Genome 
Project to identify common sequence polymor- 
phisms in 200 genes thought to be involved in en- 
vironmental diseases [48]. In a pilot study on the 
feasibility of this application to the Environmental 
Genome Project, oligonucleotide arrays will be used 
to resequence 20 candidate genes. This toxicogenetic 
approach promises to dramatically improve our un- 
derstanding of interindividual variability in disease 
susceptibility. 

FUTURE PRIORITIES 

There are many issues that must be addressed be- 
fore the full potential of microarrays in toxicology 
research can be realized. Among these are model sys- 
tem selection, dose selection, and the temporal na- 
ture of gene expression. In other words, in which 
species, at what dose, and at what time do we look 
for toxicant-induced gene expression? If human 
samples are analyzed, how variable is global gene 
expression between individuals, before and after toxi- 
cant exposure? What are the effects of age, diet, and 
other factors on this expression? Experience, in the 
form of large data sets of toxicant exposures, will 
answer these questions. 

One of the most pressing issues for array scientists 
is the construction of a national public database 
(linked to the existing public databases) to serve as a 
repository for gene-expression data. This relational 
database must be made available for public use, and 
researchers must be encouraged to submit their ex- 
pression data so that others may view and query the 
information. Researchers at the National Institutes 
of Health have made laudable progress in develop- 
ing the first generation of such a database [44,45]. In 
addition, improved statistical methods for gene clus- 
tering and pattern recognition are needed to ana- 
lyze the data in such a public database. 

The proliferation of different platforms and meth- 
ods for microarray hybridizations will improve 
sample handling and data collection and analysis and 
reduce costs. However, the variety of microarray 
methods available will create problems of data com- 
patibility between platforms. In addition, the near- 
infinite variety of experimental conditions under 



which data will be collected by different laborato- 
ries will make large-scale data analysis extremely dif- 
ficult. To help circumvent these future problems, a 
set of standards to be included on all platforms 
should be established. These standards would facili- 
tate data entry into the national database and serve 
as reference points for cross-platform and inter-labo- 
ratory data analysis. 

Many issues remain to be resolved, but it is clear 
that new molecular techniques such as microarray 
hybridization will have a dramatic impact on toxicol- 
ogy research. In the future, the information gathered 
from microarray-based hybridization experiments will 
form the basis for an improved method to assess the 
impact of chemicals on human and environmental 
health. 
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Abstract 

Recent progress in genomics and proteomics technologies has created a unique opportunity to significantly impact 
the pharmaceutical drug development processes. The perception that cells and whole organisms express specific 
inducible responses to stimuli such as drug treatment implies that unique expression patterns, molecular fingerprints, 
indicative of a drug's efficacy and potential toxicity are accessible. The integration into state-of-the-art toxicology of 
assays allowing one to profile treatment-related changes in gene expression patterns promises new insights into 
mechanisms of drug action and toxicity. The benefits will be improved lead selection, and optimized monitoring of 
drug efficacy and safety in pre-clinical and clinical studies based on biologically relevant tissue and surrogate markers. 
© 2000 Elsevier Science Ireland Ltd. All rights reserved. 
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1. Introduction 

The majority of drugs act by binding to protein 
targets, most to known proteins representing en- 
zymes, receptors and channels, resulting in effects 
such as enzyme inhibition and impairment of 
signal transduction. The treatment-induced per- 
turbations provoke feedback reactions aiming to 
compensate for the stimulus, which almost always 
are associated with signals to the nucleus, result- 
ing in altered gene expression. Such gene expres- 
sion regulations account for both the 
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pharmacological action and the toxicity of a drug 
and can be visualized by either global mRNA or 
global protein expression profiling. Hence, for 
each individual drug, a characteristic gene regula- 
tion pattern, its molecular fingerprint, exists 
which bears valuable information on its mode of 
action and its mechanism of toxicity. 

Gene expression is a multistep process that 
results in an active protein (Fig. 1). There exist 
numerous regulation systems that exert control at 
and after the transcription and the translation 
step. Genomics, by definition, encompasses the 
quantitative analysis of transcripts at the mRNA 
level, while the aim of proteomics is to quantify 
gene expression further down-stream, creating a 
snapshot of gene regulation closer to ultimate cell 
function control. 
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2. Global mRNA profiling 

Expression data at the mRNA level can be 
produced using a set of different technologies 
such as DNA microarrays, reverse transcript 
imaging, amplified fragment length polymorphism 
(AFLP), serial analysis of gene expression 
(SAGE) and others. Currently, DNA microarrays 
are very popular and promise a great potential. 
On a typical array, each gene of interest is repre- 
sented either by a long DNA fragment (200-2400 
bp) typically generated by polymerase chain reac- 
tion (PCR) and spotted on a suitable substrate 
using robotics (Schena et al., 1995; Shalon et al., 
1996) or by several short oligonucleotides (20-30 
bp) synthesized directly onto a solid support using 
photolabile nucleotide chemistry (Fodor et al., 
1991; Chee et al., 1996). From control and treated 
tissues, total RNA or mRNA is isolated and 
reverse transcribed in the presence of radioactive 
or fluorescent labeled nucleotides, and the labeled 
probes are then hybridized to the arrays. The 
intensity of the array signal is measured for each 
gene transcript by either autoradiography or laser 
scanning confocal microscopy. The ratio between 
the signals of control and treated samples reflect 
the relative drug-induced change in transcript 
abundance. 



3. Global protein profiling 

Global quantitative expression analysis at the 
protein level is currently restricted to the use of 
two-dimensional gel electrophoresis. This tech- 
nique combines separation of tissue proteins by 
isoelectric focusing in the first dimension and by 
sodium dodecyl sulfate slab gel electrophoresis- 
based molecular weight separation on the second, 
orthogonal dimension (Anderson et al., 1991). 
The product is a rectangular pattern of protein 
spots that are typically revealed by Coomassie 
Blue, silver or fluorescent staining (Fig. 2). 
Protein spots are identified by mass spectrometry 
following generation of peptide mass fingerprints 
(Mann et al., 1993) and sequence tags (Wilkins et 
al., 1996). Similar to the mRNA approach, the 
ratio between the optical density of spots from 
control and treated samples are compared to 
search for treatment-related changes. 

4. Expression data analysis 

Bioinformatics forms a key element required to 
organize, analyze and store expression data from 
either source, the mRNA or the protein level. The 
overall objective, once a mass of high-quality 
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Fig. 1 . Production of an active protein is a multistep process in which numerous regulation systems exert control at various stages 
of expression. Molecular fingerprints of drugs can be visualized through expression profiling at the mRNA level (genomics) using 
a variety of technologies and at the protein level (proteomics) using two-dimensional gel electrophoresis. 
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Fig. 2. Computerized representation of a Coomassie Blue stained two-dimensional gel electrophoresis pattern of Fischer F344 rat 
liver homogenate. 



quantitative expression data has been collected, is 
to visualize complex patterns of gene expression 
changes, to detect pathways and sets of genes 
tightly correlated with treatment efficacy and toxi- 
city, and to compare the effects of different sets of 
treatment (Anderson et al., 1996). As the drug 
effect database is growing, one may detect similar- 
ities and differences between the molecular finger- 
prints produced by various drugs, information 
that may be crucial to make a decision whether to 
refocus or extend the therapeutic spectrum of a 
drug candidate. 



5. Comparison of global mRNA and protein 
expression profiling 

There are several synergies and overlaps of data 
obtained by mRNA and protein expression analy- 
sis. Low abundant transcripts may not be easily 
quantified at the protein level using standard two- 
dimensional gel electrophoresis analysis and their 
detection may require prefractionation of sam- 
ples. The expression of such genes may be prefer- 
ably quantified at the mRNA level using 
techniques allowing PCR-mediated target amplifi- 
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cation. Tissue biopsy samples typically yield good 
quality of both mRNA and proteins; however, the 
quality of mRNA isolated from body fluids is 
often poor due to the faster degradation of 
mRNA when compared with proteins. RNA sam- 
ples from body fluids such as serum or urine are 
often not very 'meaningful', and secreted proteins 
are likely more reliable surrogate markers for 
treatment efficacy and safety. Detection of post- 
translational modifications, events often related to 
function or nonfunction of a protein, is restricted 
to protein expression analysis and rarely can be 
predicted by mRNA profiling. Information on 
subcellular localization and translocation of 
proteins has to be acquired at the level of the 
protein in combination with sample prefractiona- 
tion procedures. The growing evidence of a poor 
correlation between mRNA and protein abun- 
dance (Anderson and Seilhamer, 1997) further 
suggests that the two approaches, mRNA and 
protein profiling, are complementary and should 
be applied in parallel. 

6. Expression profiling and drug development 

Understanding the mechanisms of action and 
toxicity, and being able to monitor treatment 
efficacy and safety during trials is crucial for the 
successful development of a drug. Mechanistic 
insights are essential for the interpretation of drug 
effects and enhance the chances of recognizing 
potential species specificities contributing to an 
improved risk profile in humans (Richardson et 
al., 1993; Steiner et al., 1996b; Aicher et al., 1998). 
The value of expression profiling further increases 
when links between treatment-induced expression 
profiles and specific pharmacological and toxic 
endpoints are established (Anderson et al., 1991, 
1995, 1996; Steiner et al. 1996a). Changes in gene 
expression are known to precede the manifesta- 
tion of morphological alterations, giving expres- 
sion profiling a great potential for early 
compound screening, enabling one to select drug 
candidates with wide therapeutic windows 
reflected by molecular fingerprints indicative of 
high pharmacological potency and low toxicity 
(Arce et al., 1998). In later phases of drug devel- 



opment, surrogate markers of treatment efficacy 
and toxicity can be applied to optimize the moni- 
toring of pre-clinical and clinical studies (Doherty 
et al., 1998). 



7. Perspectives 

The basic methodology of safety evaluation has 
changed little during the past decades. Toxicity in 
laboratory animals has been evaluated primarily 
by using hematological, clinical chemistry and 
histological parameters as indicators of organ 
damage. The rapid progress in genomics and pro- 
teomics technologies creates a unique opportunity 
to dramatically improve the predictive power of 
safety assessment and to accelerate the drug devel- 
opment process. Application of gene and protein 
expression profiling promises to improve lead se- 
lection, resulting in the development of drug can- 
didates with higher efficacy and lower toxicity. 
The identification of biologically relevant surro- 
gate markers correlated with treatment efficacy 
and safety bears a great potential to optimize the 
monitoring of pre-clinical and clinical trails. 
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DNA array technology makes it possible to rapidly genotype individuals or quantify the expression 
of thousands of genes on a single filter or glass slide, and. holds, enormous potential in toxicologic 
applications. This potential led to a U.S. Environmental Protection Agency-sponsored workshop 
tided "Application of Microairays to Toxicology" on 7-8 January 1999 in Research Triangle. Park, 
North Carolina. In addition to providing state-of-the-art information on the application of DNA or 
gene microarrays, the workshop catalyzed the formation of several collaborations, committees, and 
user's groups throughout the Research Triangle Park area and beyond. Potential application: of. 
microarrays to toxicologic research and risk assessment include genome-wide expression analyses to 
identify gene-expression networks and toxicant-specific signatures that can be used to define mode 
of action, for exposure assessment, and for environmental monitoring. Arrays may also prove useful 
for monitoring genetic variability and its relationship to toxicant susceptibility in Human popula- 
tions. Key words: DNA. arrays, gene arrays, microarrays, toxicology. Environ Health Perspect 
107:681-^85 (1999). [Online 6 Jury 1999] 
http://ehpnetl.niehs.nib.gw/docrfl999/J07^ 



Decoding the genetic blueprint is a dream that 
offers manifold returns in terms of understand- 
ing how organisms develop and function in an 
often hostile environment. With the rapid 
advances in molecular biology over the last 30 
years, the dream has come a step closer to reali- 
ty. Molecular biologists now have the ability to 
elucidate the composition of any genome. 
Indeed, almost 20 genomes have already been 
sequenced and more than 60 are currently 
under way. Foremost among these is the 
Human Genome Mapping Project. However, 
the genomes of a number of commonly used 
laboratory species are also under intensive 
investigation, including yeast, Arabidopsis* 
maize, rice, zebra fish, mouse, rat, and dog. It 
is widely expected that the completion of such 
programs will facilitate the development of 
many powerful new techniques and approach- 
es to diagnosing and creating genetically and 
environmentally induced diseases which afflict 
mankind. However, the vast amount of data 
being generated by genome mapping will 
require new high-throughput technologies to 
investigate the function of the millions of new 
genes that are being reported. Among the most 
widely heralded of the new functional 
genomics technologies are DNA arrays, which 
represent perhaps the most anticipated new 
molecular biology technique since polymerase 
chain reaction (PCR). 

Arrays enable the study of literally thou- 
sands of genes in a single experiment. The 
potential importance of arrays is enormous and 
has been highlighted by the recent publication 
of an entire Nature Genetics supplement dedi- 
cated to the technology (/). Despite this huge 
surge of interest, DNA arrays are still little used 
and largely unproven, as demonstrated by the 
high ratio of review and press articles to actual 
data papers. Even so, the. potential they offer 



has driven venture capitalists into a frenzy of 
investment and many new companies are 
springing up to claim a share of this rapidly 
developing market. 

The U.S. Environmental Protection 
Agency (EPA) is interested in applying DNA 
array technology to ongoing toxicologic stud- 
ies. To learn more about the current state of 
the technology, the Reproductive Toxicology 
Division (RTD) of the National Health and 
Environmental Effects Research Laboratory 
(NHEERL; Research Triangle Park, NC) 
hosted a workshop on "Application of 
Microarrays to Toxicology" on 7-8 January 
1999 in Research Triangle Park, North 
Carolina. The workshop was organized by 
David Dix, Robert Kavlock, and John Rockett 
of the RTD/NHEERL. Twenty-two intra- 
mural and extramural scientists from govern- 
ment, academia, and industry shared informa- 
tion, data, and opinions on the current and 
future applications for this exciting new tech- 
nology. The workshop had more than 150 
attendees, including researchers, students, and 
administrators from the EPA, the National 
Institute of Environmental Health Sciences 
(NIEHS), and a number of other establish- 
ments from Research Triangle Park and 
beyond Presentations ranged from the tech- 
nology behind array production through the 
sharing of actual experimental data and projec- 
tions on the future importance and applica- 
tions of arrays. The information contained in 
the workshop presentations should provide aid 
and insight into arrays in general and their 
application to toxicology in particular. 

Array El merits 

In the context of molecular biology, the word 
"array" is normally used to refer to a series of 
DNA or protein elements firmly attached in 



a regular pattern to some kind of supportive 
medium. DNA array is often used inter- 
changeably with gene array or microarray. 
Although not formally defined, microarray is 
generally used to describe the higher density 
arrays typically printed on glass chips. The 
DNA elements that make up DNA arrays 
can be oligonucleotides, partial gene 
sequences, or full-length cDNAs. Companies 
offering p re-made arrays that contain less 
than full-length clones normally use regions 
of the genes which are specific to that gene to 
prevent false positives arising through cross- 
hybridization. Sequence verification of 
cDNA done identity is necessary because of 
errors in identifying specific clones from 
cDNA libraries and databases. Premade 
DNA arrays printed on membranes are cur- 
rently or imminendy available for human, 
mouse, and rat. In most cases they contain 
DNA sequences representing several thou- 
sand different sequence clusters or genes as 
delineated through the National Center for 
Biotechnology Information UniGene Project 
{2). Many of these different UniGene dusters 
(putative genes) are represented only by 
expressed sequence tags (ESTs). 

Array Printing 

Arrays are typically printed on one of two 
types of support matrix. Nylon membranes 
are used by most off-the-shelf array providers 
such as Clontech Laboratories, Inc. 
(Palo Alto, CA), Genome Systems, Inc. (St. 
Louis, MO), and Research Genetics, Inc. 
(Huntsville, AL). Microarrays such as those 
produced by Affymetrix, Inc. (Santa Clara, 
CA), Incyte Pharmaceuticals, Inc. (Palo Alto, 
CA), and many do-it-yourself (DIY) arraying 
groups use glass wafers or slides. Although 
standard microscope slides may be used, they 
must be preprepared to facilitate sticking 
of the DNA to the glass. Several different 
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coatings have been successfully used, includ- 
ing silane and lysine. The coating of slides 
can easily be carried out in the laboratory, 
but many prefer the convenience of precoated 
slides available from suppliers. 

Once the support matrix has been pre- 
pared, the DNA elements can be applied by 
several methods. Affymetrix, Inc., has devel- 
oped a unique photolithographic technology 
for attaching oligonucleotides to glass wafers. 
More commonly, DNA is applied by either 
noncontact or contact printing. Noncontact 
printers can use thermal, solenoid, or piezoelec- 
tric technology to spray aliquots of solution 
onto the support matrix and may be used to 
produce slide or membrane-based arrays. 
Cartesian Technologies, Inc. (Irvine, CA) has 
developed nQUAD technology for use in its 
PixSys printers. The system couples a syringe 
pump with the microsolenoid valve, a combi- 
nation that provides rapid quantitative dispens- 
ing of nanoUter volumes (down to 42 nL) over 
a variable volume range. A different approach 
to noncontact printing uses a solid pin and ring 
combination (Genetic MicroSystems, Inc., 
V/oburn, MA). This system (Figure 1) allows a 
broader range of sample, including cell suspen- 
sions and particulates, because the printing 
head cannot be blocked up in the same way as 
a spray nozzle. Fluid transfer is controlled in 
this system primarily by the pin dimensions 
and the force of deposition, although the 
nature of the support matrix and the sample 
will also affect transfer to some degree. 

In contact printing, the pin head is dipped 
in the sample and then touched to the support 
matrix to deposit a small aliquot. Split pins 
were one of the first contact-printing devices 
to be reported and are the suggested format 
for DIY arrayers, as described by Brown (3). 
Split pins are small metal pins with a precise 
groove cut vertically in the middle of the pin 
tip. In this system, 1-48 split pins are posi- 
tioned in the pin-head. The split pins work by 
simple capillary action, not unlike a fountain 
pen — when the pin heads are dipped in the 
sample, liquid is drawn into the pin groove. A 
small (fixed) volume is then deposited each 
time the split pins are gently touched to 
the support matrix. Sample (100-500 pL 
depending on a variety of parameters) can be 
deposited on multiple slides before refilling is 
required, and array densities of > 2,500 
spots/cm 2 may be produced. The deposit vol- 
ume depends on the split size, sample fluidi- 
ty, and the speed of printing. Split pins are 
relatively simple to produce and can be made 
in-house if a suitable machine shop is avail- 
able. Alternatively, they can be obtained 
directly from companies such as TeleChem 
International, Inc. (Sunnyvale, CA). 

Irrespective of their source, printers 
should be run through a preprint sequence 
prior to producing the actual experimental 



arrays; the first 100 or so spots of a new run 
tend to be somewhat variable. Factors effect- 
ing spot reproducibility include slide treat- 
ment homogeneity, sample differences, and 
instrument errors. Other factors that come 
into play include clean ejection of the drop 
and clogging (nQUAD printing) and 
mechanical variations and long-term alter- 
ation in print-head surface of solid and split 
pins. However, with careful preparation it is 
possible to get a coefficient of variance for 
spot reproducibility below 10%. 

One potential printing problem is sample 
carryover. Repeated washing, blotting, and 
drying (vacuum) of print pins between samples 
is normally effective at reducing sample carry- 
over to negligible amounts. Printing should 
also be carried out in a controlled environ- 
ment. Humidified chambers are available in 
which to place printers. These help prevent 
dust contamination and produce a uniform 
drying rate, which is important in determining 
spot size, quality, and reproducibility. 

In summary, although several printing 
technologies are available, none are par- 
ticularly outstanding and the bottom line 
is that they are still in a relatively early stage 
of evolution. 

Array Hybridization 

The hybridization protocol is, practically 
speaking, relatively straightforward and those 
with previous experience in blotting should 
have little difficulty. Array hybridizations 
are, in essence, reverse Southern/Northern 
blots — instead of applying a labeled probe to 
the target population of DNA/RNA, the 
labeled population is applied to the probe(s). 
With membrane-based arrays,, the control and 
treated mRNA populations are normally con- 
verted to cDNA and labeled with isotope (eg., 
33 P) in die process. These labeled populations 
are then hybridized independendy to parallel 
or serial arrays and the hybridization signal is 
detected with a phosporimager. A less com- 
monly used alternative to radioactive probes is 
enzymatic detection. The probe may be 
biorinylated, haptenylated, or have alkaline 
phosphatase/horseradish peroxidase attached. 
Hybridization is detected by enzymatic reac- 
tion yielding a color reaction (4). Differences 
in hybridization signals can be detected by eye 
or, more accurately, with the help of digital 
imaging and commercially available software. 
The labeling of the test populations for slide- 
based microarrays uses a slightly different 
approach, The probe typically consists of two 
samples of polyA + RNA (usually from a treated 
and a control population) that are converted to 
cDNA; in the process each is labeled with a 
different fluor. The independently labeled 
probes are then mixed together and hybridized 
to a single microarray slide and the resulting 
combined fluorescent signal is scanned After 
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Figure 1. Genetic Microsystems (Woburn, MA) pin 
ring system for printing arrays. The pin ring com- 
bination consists of a circular open ring oriented 
parallel to the sample solution, with a vertical pin 
centered over the ring. When the ring is dipped 
into a solution and lifted, it withdraws an aliquot 
of sample held by surface tension. To spot the 
sample, the pin is driven down through the ring 
and a portion of the solution is transferred to the 
bottom of the pin. The pin continues to move 
downward until the pendant drop of solution 
makes contact with the underlying surface. The 
pin is then lifted, and gravity and surface tension 
cause deposition of the spot onto the array. 
Figure from Flowers et al. [14\, with permission 
from Genetic Microsystems. 

normalization, it is possible to determine the 
ratio of fluorescent signals from a single 
hybridization of a slide-based microarray. 

cDNA derived from control and treated 
populations of RNA is most commonly 
hybridized to arrays, although subtractive 
hybridization or differential display reactions 
may also be used. Fluorophore- or radiola- 
beled nucleotides are direcdy incorporated 
into the cDNA in the process of converting 
RNA to cDNA. Alternatively, 5' end-labeled 
primers may be used for cDNA synthesis. 
These are labeled with a fluorophore for 
direct visualization of the hybridized array. 
Alternatively, biotin or a hapten may be 
attached to the primer, in which case fluor- 
labeled streptavidin or antibody must be 
applied before a signal can be generated. The 
most commonly used fluorophores at present 
are cyanine (Cy)3 and Cy5 (Amersham 
Pharmacia Biotech AB, Uppsala, Sweden). 
However, the relative expense of these fluo- 
rescent conjugates has driven a search for 
cheaper alternatives. Fluorescein, rhodarnine, 
and Texas red have all been used, and 
companies such as Molecular Probes, Inc. 
(Eugene, OR) are developing a series of 
labeled nucleotides with a wide range of exci- 
tation and emission spectra which may prove 
to function as well as the Cy dyes. 
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Table 1, Advantages and disadvantages of different microarray scanning systems. 



Nonconfocat laser scanner 


Advantages Few moving parts 


Relatively simple optics 


Small depth of focus reduces 






artifacts 


Fast scanning of bright 




May have high light collection 


samples 




efficiency 


Disadvantages Less appropriate for dim 


Low tight collection efficiency 


Small depth of focus requires 


samples 




scanning precision 


Optical scatter can limit 


Background artifacts not rejected 




performance 








Resolution typically low 





Analysis of DNA Microarrays 

Membrane-based arrays are normally analyzed 
on film or with a phosphorimager, whereas 
chip-based arrays require more specialized scan- 
ning devices. These can be divided into three 
main groups: the charge-coupled device camera 
systems! the nonconfbcal laser scanners, and the 
confbcal laser scanners. The advantages and dis- 
advantages of each system are listed in Table 1. 

Because a typical spot on a microarray can 
contain > 10 s molecules, it is clear that a large 
variation in signal strength may occur. 
Current scanners cannot work across this 
many orders of magnitude (4 or 5 is more typ- 
ical). However, the scanning parameters can 
normally be adjusted to collect more or less 
signal, such that two or three scans of the same 
array should permit the detection of rare and 
abundant genes. 

When a microarray is scanned, the fluores- 
cent images are captured by software normally 
included with the scanner. Several commercial 
suppliers provide additional software for quan- 
tifying array images, but the software tools are 
constantly evolving to meet the developing 
needs of researchers, and it is prudent to 
define one's own needs and clarify the exact 
capabilities of the software before its purchase. 
Issues that should be considered include the 
following: 

* Can the software locate offset spots? 

* Can it quantitate across irregular hybridiza- 
tion signals? 

• Can the arrayed genes be programmed in for 
easy identification and location? 

• Can the software connect via the Internet to 
databases containing further information on 
the gene(s) of interest? 

One of the key issues raised at the work- 
shop was the sensitivity of microarray technol- 
ogy. Experiments by General Scanning, Inc. 
(Watertown, MA), have shown that by using 
the Cy dyes and their scanner, signal can be 
detected down to levels of < 1 fluor molecule 
per square micrometer, which translates to 
detecting a rare message at approximately one 
copy per cell or less. 

Array Applications 

Although arrays are an emerging technology 
certain to undergo improvement and 
aIterarion,*they have already been applied use- 
fully to a number of model systems. Arrays are 
at their most powerful when they contain the 
entire genome of the species they are being 
used to study. For this reason, they have strong 
support among researchers utilizing yeast and 
Caenorhabditis elegans (5). The genomes of 
both of these species have been sequenced and, 
in the case of yeast, deposited onto arrays for 
examination of gene expression (6,7). With 
both of these species, it is relatively easy to 
perturb individual gene expression. Indeed, C 



CCD, charge-coupled device. 
From Kawasaki (73). 

elegans knockouts can be made simply by 
soaking the worms in an antisense solution of 
the gene to be knocked out. 

By a process of systematic gene disrup- 
tion, it is now possible to examine the cause 
and effect relationships between different 
genes in these simple organisms. This kind of 
approach should help elucidate biochemical 
pathways and genetic control processes, 
deconvolve polygenic interactions, and 
define the architecture of the cellular network. 
A simple case study of how this can be 
achieved was presented by Butow [University 
of Texas Southwestern Medical Center, 
Dallas, TX (Figure 2)]. Although it is the 
phenotypic result of a single gene knockout 
that is being examined, the effect of such 
perturbation will almost always be polygenic 
Polygenic interactions will become increasing- 
ly important as researchers begin to move" 
away from single gene systems when examin- 
ing the nature of toxicologic responses to 
external stimuli. This is especially important 
in toxicology because the phenotype pro- 
duced by a given environmental insult is 
never the result of the action of a single gene; 
rather, it is a complex interaction of one or 
multiple cellular pathways. Phenomena such 
as quantitative trait (the continuous variation 
of phenotype), epistasis (the effect of alleles of 
one or more genes on the expression of other 
genes), and penetrance (proportion of indi- 
viduals of a given genotype that display a par- 
ticular phenotype) will become increasingly 
evident and important as toxicologists push 
toward the ultimate goal of matching the 
responses of individuals to different 
environmental stimuli. 

Analysis of the transcriptome (the expres- 
sion level of all the genes in a given cell popula- 
tion) was a use of arrays addressed by several 
speakers. Unfortunately, current gene nomen- 
clature is often confusing in that single genes 
are allocated multiple names (usually as a result 
of independent discovery by different laborato- 
ries), and there was a call for standardization of 
gene nomenclature. Nevertheless, once a tran- 
scriptome has been assembled it can then be 
transferred onto arrays and used to screen any 
chosen system. The EPA MicroArray 
Consortium (EPAMAQ is assembling testes 



transcriptomes for human, rat, and mouse. In a 
slighdy different approach, Nuwaysir et al. (6) 
describes how the NIEHS assembled what is 
effectively a "toxicological transcriptome" — a 
library of human and mouse genes that have 
previously been proven or implicated in 
responses to toxicologic insults. Clontech 
Laboratories, Inc (Palo Alto, CA), has begun a 
similar process by developing stress/toxicology 
filter arrays of rat, mouse, and human genes. 
Thus, rather than being tissue or cell specific, 
these stress/toxicology arrays can be used across 
a variety of model systems to look for alter- 
ations in the expression of toxicologically 
important genes and define the new field of 
toxicogenomics. The potential to identify toxi- 
cant families based on tissue- or cell-specific 
gene expression could revolutionize drug test- 
ing. These molecular signatures or fingerprints 
could not only point to the possible 
toxicity/carcinogenicity of newly discovered 
compounds (Figure 3), but also aid in elucidat- 
ing their mechanism of action through identifi- 
cation of gene expression networks. By exten- 
sion, such signatures could provide easily iden- 
tifiable biomarkers to assess the degree, time, 
and nature of exposure. 

DNA arrays are primarily a tool for exam- 
ining differential gene expression in a given 
model. In this context they are referred to as 
dosed systems because they lack the ability of 
other differential expression technologies, eg., 
differential display and subtractrve hybridiza- 
tion, to detect previously unknown genes not 
present on the array. This would appear to 
limit the power of DNA arrays to the imagina- 
tions and preconceptions of the researcher in 
selecting genes previously characterized and 
thought to be involved in the model system. 
However, die various genome sequencing pro- 
jects have created a new category of 
sequence — the EST — that has partially molli- 
fied this deficiency. ESTs are cDNAs expressed 
in a given tissue that, although they may share 
some degree of sequence similarity to previous- 
ly characterized genes, have not been assigned 
specific genetic identity. By incorporating EST 
clones into an array, it is possible to monitor 
the expression of these unknown genes. This 
can enable the identification of previously 
uncharacterized genes that may have biologic 
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significance in the model system. Filter arrays 
from Research Genetics and slide arrays from 
Incyte Pharmaceuticals both incorporate large 
numbers of ESTs from a variety of species. 

A further use of microarrays is the identifi- 
cation of single nucleotide polymorphisms 
(SNPs). These genomic variations are abun- 
dant — they occur approximately every 1 kb or 
so — and are the basis of restriction fragment 
length polymorphism analysis used in forensic 
analysis. Asymetrix, Inc., designed chips that 
contain multiple repeats of the same gene 
sequence. Each position is present with all four 
possible bases. After the hybridization of the 
sample, the degree of hybridization to the dif- 
ferent sequences can be measured and the exact 
sequence of the target gene deduced. SNPs are 
thought to be of vital importance in drug 
metabolism and toxicology. For example, sin- 
gle base differences in the regulatory region or 
active site of some genes can account for huge 
differences in the activity of that gene. Such 
SNPs are thought to explain why some people 
are able to metabolize certain xenobiotics bet- 
ter than others. Thus, arrays provide a further 
tool for the toxicologist investigating the 
nature of susceptible subpopulations and toxi- 
cologic response. 

There are still many wrinkles to be ironed 
out before arrays become a standard tool for 
toxicologists. The main issues raised at the 
workshop by those with hands-on experience 
were the following: 

• Expense: the cost of purchasing/contracting 
this technology is still too great for many 
individual Laboratories. 




Figure 2. Potential effects of gene knockout within 
positively and negatively regulated gene expression 
networks. /, is limiting in wild type for expression of 
^. \A) A simple, two-component, linear regulatory 
network operating on gene ^, where /, is a positive 
effector of ^ and j n is either a positive or negative 
effector of i y This network could be deduced by 
examining the consequence of (8) deleting j n on the 
expression of /, and where the expression of ^ 
would be decreased or increased depending on 
whether j n was a positive or negative regulator. 
These and other connected components of even 
greater complexity could be revealed by genome- 
wide expression analysis. From Butow ( r5l. 



> Clones: the logistics of identifying, obtaining, 
and maintaining a set of nonredundant, non- 
contaminated, sequence-verified, species/cell/ 
tissue/field-specific clones. 

* Use of inbred strains: where whole-organism 
models are being used, the use of inbred 
strains is important to reduce the potentially 
confusing effects of the individual variation 
typically seen in outbred populations. 

► Probe the need for relatively large amounts 
of RNA, which limits the type of sample 
(eg., biopsy) that can be used. Also, different 
RNA extraction methods can give different 
results. 

° Specificity: the ability to discriminate accu- 
rately between closely related genes (eg., the 

; cytochrome p450 family) and splice variants. 

t Quantitation: the quantitation of gene 

| expression using gene arrays is still open to 
debate. One reason for this is the different 
incorporation of the labeling dyes. However, 
the main difficulty lies in knowing what to 
normalize against One option is to include a 
large number of so-called housekeeping genes 
in the array. However, the expression of these 
genes often change depending on the tissue 
and the toxicant, so it is necessary to charac- 
terize the expression of these genes in the 
model system before utilizing them. This is 
clearly not a viable option when screening 
multiple new compounds. A second option 
is to include on the array genes from a nonre- 
lated species (eg., a plant gene on an animal 
array) and to spike the probe with synthetic 
RNA(s) complementary to the gene(s). 

• Reproducibility: this is sometimes question- 
able, and a figure of approximately two or 
three repeats was used as the minimum num- 
ber required to confirm initial findings. 



Again, however, most people advocated the 
use of Northern blots or reverse transcriptase 
PCR to confirm findings. 

• Sensitivity: concerns were voiced about the 
number of target molecules that must be pre- 
sent in a sample for them to be detected on 
the array. 

• Efficiency: reproducible identification of 1.5- 
to 2-fbld differences in expression was report- 
ed, although the number of genes that 
undergo this level of change and remain 
undetected is open to debate It is important 
that this level of detection be ultimately 
achieved because it is commonly perceived 
that some important transcription factors 
and their regulators respond at such low lev- 
els. In most cases, 3- to 5-fold was the mini- 
mum change that most were happy to 
accept. 

• Bioinformatics: perhaps the greatest concern 
was how to accurately interpret the data with 
the greatest accuracy and efficiency. The 
biggest headache is trying to identify net- 
works of gene expression that are common to 
different treatments or doses. The amount of 
data from a single experiment is huge It may 
be that, in the future several groups individ- 
ually equipped with specialized software algo- 
rithms for studying their favorite genes or 
gene systems will be able to share the same 
hybridized chips. Thus, arrays could usher in 
a new perspective on collaboration and the 
sharing of data. 

EPAMAC 

Perhaps the main reason most scientists are 
unable to use array technology is the high cost 
involved, whether buying off-the-shelf mem- 
branes, using contract printing services, or 
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Figure 3. Gene expression profiles— also called fingerprints or signatures — of known toxicants or toxi- 
cant families may, in the future, be used to identify the potential toxicity of new drugs, etc. In this exam- 
ple, the genetic signature of test compound 1 is identical to that of known peroxisome prolrferators, 
whereas that of test compound 2 does not match any known toxicant family. Based on these results, test 
cpmpound 2 would be retained for further testing and test compound 1 would be eliminated. 
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producing chips in-housc. In view of this, 
researchers at the RTD/NHEERL initiated 
the EPAMAC. This consortium brings 
together scientists from the EPA and a num- 
ber of extramural labs with the aim of devel- 
oping microarray capability through the shar- 
ing of resources and data. EPAMAC 
researchers are primarily interested in the 
developmental and toxicologic changes seen 
in testicular and breast tissue, and a portion 
of the workshop was set aside for EPAMAC 
members to share their ideas on how the 
experimental application of microarrays could 
facilitate their research. One of the central 
areas of interest to EPAMAC members is the 
effect of xenobiotics on male fertility and 
reproductive health. Of greatest concern is 
the effect of exposure during critical periods 
of development and germ cell differentiation 
[9) t and how this may compromise sperm, 
counts and quality following sexual matura- 
tion (10). As well as spermatogenic tissue, 
there is also interest in how residual mRNA 
found in mature sperm (II) could be used as 
an indicator of previous xenobioric effects (it 
is easier to obtain a semen sample than a tes- 
ticular biopsy). Arrays will be used to examine 
and compare the effect of exposure to heat 
and chemicals in testicular and epididymal 
gene expression profiles, with the aim of 
establishing relationships/associations 
between changes in developmental landmarks 
and the effects on sperm count and quality. 
Cluster, pattern, and other analysis of such 
data should help identify hidden relationships 
between genes that may reveal potential 
mechanisms of action and uncover roles for 
genes with unknown functions. 

Summary 

The full impaa of DNA arrays may not be 
seen for several years, but the interest shown at 
this regional workshop indicates the high level 
of interest that they foster. Apart from educat- 
ing and advertising the various technologies in 
this field, this workshop brought together a 
number of researchers from the Research 
Triangle Park area who are already using DNA 
arrays. The interest in sharing ideas and experi- 
ences led to the initiation of a Triangle array 
user's group. 
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Array technology is still in its infancy. This 
meaxis that the hardware is still improving and 
therej is no current consensus for standard pro- 
cedures, quantitation, and interpretation. 
Consistency in spotting and scanning arrays is 
not yet optimized, and this is one of the most 
critical requirements of any experiment. In 
addition, one of the dark regions of array tech- 
nology — strife in the courts over who owns 
whaq portions of it — has further muddled the 
future and is a potential barrier toward the 
development of consensus procedures. 

Perhaps the greatest hurdle for the applica- 
tion of arrays is the actual interpretation of 
data. No specialists in bioinformarics attended 
the wrkshop, largely because they are rare and 
because as yet no one seems clear on the best 
method of approaching data analysis and inter- 
pretation. Cross-referencing results from mul- 
tiple ^experiments (time, dose, repeats, different 
anirrjals, different species) to identify common- 
ly expressed genes is a great challenge. In most 
cases; we are still a long way from understand- 
ing How the "expression of gene X is related to 
the Expression of gene Y, and ordering gene 
expression to delineate causal relationships. 

To the ordinary scientist in the typical lab- 
oratory, however, the most immediate prob- 
lem Is a lack of affordable instrumentation. 
One) can purchase premade membranes at 
relatively affordable prices. Although these 
may I be useful in identifying individual genes 
to pursue in more detail using other methods, 
the ri umbers that would be required for even a 
small routine toxicology experiment prohibit 
this as a truly viable approach. For the toxicol- 
ogisq, there is a need to carry out multiple 
experiments — dose responses, time curves, 
multiple animals, and repeats. Glass-based 
DNA arrays are most attractive in this context 
because they can be prepared in large batches 
from the same DNA source and accommo- 
date control and treated samples on the same 
chip! Another problem with current off-the- 
shelf] arrays is that they often do not contain 
one pr more of the particular genes a group is 
interested in. One alternative is to obtain 
t produce a set of custom clones and 
contraa printing of membranes or slides 
xl out by a company such as Genomic 
Solutions, Inc (Ann Arbor, MI). This approach 
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is less expensive than laying out capital for 
one's own entire system, although at some 
point it might make economic sense to print 
one's own arrays. 

Finally, DNA arrays are currendy a team 
effort They are a technology that uses a wide 
range of skills including engineering, statistics, 
molecular biology, chemistry, and bioinfor- 
mancs. Because most individuals are skilled in 
only one or perhaps two of these areas, it 
appears that success with arrays may be best 
expected by teams of collaborators consisting 
of individuals having each of these skills. 

Those considering array applications may 
be amused or goaded on by the following 
quote from Fortune magazine ( 12): 

Microprocessors have reshaped our economy, . 
spawned vast fortunes and changed the way we live. 
Gene chips could be even bigger. 

Although this comment may have been 
designed to excite the imagination rather than 
accurately reflect the truth, it is fair to say that 
the age of functional genomics is upon us. 
DNA arrays look set to be an important tool in 
this new age of biotechnology and will likely 
contribute answers to some of toxicology's 
most fundamental questions. 
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Subject: RE: [Fwd: Toxic logy Chip] 
Date: Mon. 3 Jul 2000 08:09:45 -0400 
From: "Afshari.Cynthia" <afshari@'niehs.nih.gov> 
To: "'Diana Hamlet-Cox"" <dianahc@incyte.com> 

You car. see -he lis" of clones that we have or. our 12 K chip at 
http: nar.uel .r.iehs.r.ih.gsv maps -guest 'clonesrch. cfr. 

W selected a subset of genes { 2000K) that we believed critical tc ;ox 
response and basic cellular processes and added a set of clones and ZSZs :: 
this. We have included a set of control genes (80-) that were selected by 
the NHGRI because they did not change across a large set of array 
experiments. However, we have found that some of these genes chance 
sigr.f icar.tly after tox treatments and are in the process cf looking at the 
variation of each of these 80* genes across our experiments. 
Our chips are constantly changing and being updated and we hope that cur 
data will lead us to what the toxchip should really be. 
1 hope this answers your question. 
Cindy Afshari 

> 

> From: Diana Hamlet -Cox 

> Sent: Monday, June 26, 2000 8:52 PM 

> To: afshariQniehs.nih.gov 

> Subject: [Fwd: Toxicology Chip] 
> 

> Dear Dr. Afshari, 
> 

> Since I have not yet had a response from Bill Grigg, perhaps he was not 

> the right person to contact. 
> 

> Can you help me in this matter? I. don't need to know the sequences, 

> necessarily, but I would like very much to know what types of sequences 

> are being used, e.g., GPCRs (more specific?) , ion channels, etc. " 
> 

> Diana Hamlet -Cox 
> 

> Original Message 

> Subject: Toxicology Chip 

> Date: Mon, 19 Jun 2000 18:31:48 -0700 

> From: Diana Hamlet -Cox <dianahcQincyte.com> 

> Organization: Incyte Pharmaceuticals 

> To: grigg6niehs.nih.gov 
> 

> Dear Colleague: 
> 

> I am doing literature research on the use of expressed genes as 

> pharmacotoxicology markers, and found the Press Release dazed February 

> 29, 2000 regarding the work of the NIEHS in this area. I would like to 

> know if there is a resource I can access (or you could provide?) that 

> would give me a list of the 12,000 genes that are on your Human ToxChip 

> Microarray. In particular, 1 am interested in the criteria used to 

> select sequences for the ToxChip, including any control sequences 

> included in the microarray. 
> 

> Thank you for your assistance in this request. 
> 

> Diana Hamlet-Cox, Ph.D. 

> Incyte Genomics, Inc. 
> 

> — 
> 
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Proteomics: a major new 
technology for the drug 
discovery process 

Martin J. Page, Bob Amess, Christian Rohlff, Colin Stubberfield 
and Raj Parekh 



Proteomics is a new enabling technology that is being 
integrated into the drug discovery process. This will 
facilitate the systematic analysis of proteins across any 
biological system or disease, forwarding new targets 
and information on mode of action, toxicology and sur- 
rogate markers. Proteomics is highly complementary to 
genomic approaches in the drug discovery process and, 
for the first time, offers scientists the ability to integrate 
information from the genome, expressed mRNAs, their 
respective proteins and subcellular localization. It is ex- 
pected that this will lead to important new insights into 
disease mechanisms and improved drug discovery 
strategies to produce novel therapeutics. 

Among the major pharmaceutical and biotechnol- 
ogy companies, it is clearly recognized that the 
business of modern drug discovery is a highly 
competitive process. All of the many steps in- 
volved are inherently complex, and each can involve a 
high risk of attrition. The players in this business strive 
continuously to optimize and streamline the process; each 
seeking to gain an advantage at every step by attempting 
to make informed decisions at the earliest stage possible. 
The desired outcome is to accelerate as many key activities 
in the drug discovery process as possible. This should pro- 



duce a new generation of robust drugs that offer a high 
probability of success and reach the clinic and market 
ahead of the competition. 

There has been noticeable emphasis over recent years 
for companies to aggressively review and refine their 
strategies to discover new drugs. Central to this has been 
the introduction and implementation of cutting-edge 
technologies. Most, if not all, companies have now inte- 
grated key technology platforms that incorporate gen- 
omics, mRNA expression analysis, relational databases, 
high-throughput robotics, combinatorial chemistry and 
powerful bioinformatics. Although it is still early days to 
quantify the real impact of these platforms in clinical and 
commercial terms, expectations are high, and it is widely 
accepted that significant benefits will be forthcoming. This 
is largely based on data obtained during preclinical studies 
where the genomic 1,2 and microarray 3,4 technologies have 
already proved their value. 

However, there are several noteworthy outcomes that re- 
sult from this. Many comments are voiced that scientists 
armed with these technologies are now commonly faced 
with data overload. Thus, in some instances, rather than 
facilitating the decision process, the accumulation of more 
complex data points, many with unknown consequences, 
can seem to hinder the process. Also, most drug compa- 
nies have simultaneously incorporated very similar compo- 
nents of the new technology platforms, the consequence 
being that it is becoming difficult yet again to determine 
where a clear competitive advantage will arise. Finally, in 
recent years, largely as a result of the accessibility of the 
technologies, there has been an overwhelming emphasis 
placed on genomic and mRNA data rather than on protein 
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Sample 2D gels and Curation and Differential analysis Mass spectrometry 

imaging interrogation (Proteograph™) and annotation 




Figure 1. Steps involved in analysing a biological sample by proteomics. MCI, molecular cluster index. 



analysis. It is important to remember that proteins dictate 
biological phenotype - whether it is normal or diseased - 
and are the direct targets for most drugs. 

Prot omics: new technology for 
the analysis of proteins 

It is now timely to recognize that complementary technol- 
ogy in the form of high-throughput analysis of the total 
protein repertoire of chosen biological samples, namely 
proteomics, is poised to add a new and important dimen- 
sion to drug discovery. In a similar fashion to genomics, 
which aims to profile every gene expressed in a cell, pro- 
teomics seeks to profile every protein that is expressed 5-7 . 
However, there is added information, since proteomics can 
also be used to identify the post-translational modifications 
of proteins 8 , which can have profound effects on bio- 
logical function, and their cellular localization. Importantly, 
proteomics is a technology that integrates the significant 
advances in two-dimensional (2D) electrophoretic separa- 
tion of proteins, mass spectrometry and bioinformatics. 
With these advances it is now possible to consistently de- 
rive proteomes that are highly reproducible and suitable 
for interrogation using advanced bioinformatic tools. 

There are many variations whereby different laboratories 
operate proteomics. For the purpose of this review, the 



process used at Oxford GlycoSciences (OGS), which uses 
an industrial-scale operation that is integral to its drug dis- 
covery work, will be described. The individual steps of 
this process, where up to 1000 2D gels can be run and 
analysed per week, are summarized in Fig. 1. The incom- 
ing samples are bar coded and all information relevant to 
the sample is logged into a Laboratory Information 
Management System (LIMS) database. There can be a wide 
range in the type of samples processed, as applicable to 
individual steps in the drug discovery pipeline, and these 
will be mentioned later. The samples are separated accord- 
ing to their charge (pi) in the first dimension, using iso- 
electric focusing, followed by size (MW) using SDS-PAGE 
in the second dimension. Many modifications have been 
made to these steps to improve handling, throughput and 
reproducibility. The separated proteins are then stained 
with fluorescent dyes which are significantly more sensi- 
tive in detection than standard silver methods and have a 
broader dynamic range. The image of the displayed pro- 
teins obtained is referred to as the proteome, and is digi- 
tally scanned into databases using proprietary software 
called ROSETTA™. The images are subsequently curated, 
which begins with the removal of any artefacts, cropping 
and the placement of pI/MW landmarks. The images from 
replicate images are then aligned and matched to one 
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another to generate a synthetic composite image. This is 
an important step, as the proteome is a dynamic situation, 
and it captures the biological variation that occurs, such 
that even orphan proteins are still incorporated into the 
analysis. 

By means of illustration, Fig. 1 shows the process 
whereby proteomes are generated from normal and dis- 
ease samples and how differentially expressed proteins are 
identified. The potential of this type of analysis is tremen- 
dous. For example, from a mammalian cell sample, in ex- 
cess of 2000 proteins can typically be resolved within the 
proteome. The quality of this is shown in Fig. 2, which 
shows representative proteomes from three diverse bio- 
logical sources: human serum, the pathogenic fungus 
Candida albicans and the human hepatoma cell line 
Huh7. 

Use of proteomics to identify 
dis ase specific proteins 

In most cases, the drug discovery process is initiated by 
the identification of a novel candidate target — almost al- 
ways a protein - that is believed to be instrumental in the 
disease process. To date, there, is a variety of means 
whereby drug targets have been forthcoming. These in- 
clude molecular, cellular and genomic approaches, mostly 
centred upon DNA and mRNA analysis. The gene in ques- 
tion is isolated, and expression and characterization of its 
coded protein product - i.e. the drug target - is invariably 
a secondary event. 

With the proteomic approach, the starting point is at the 
other end of the 'telescope'. Here there is direct and im- 



mediate comparison of the proteomes from paired normal 
and disease materials. Examples of these pairs are: (1) pu- 
rified epithelial cell populations derived from human 
breast tumours, matched to purified normal populations of 
human breast epithelial cells, and (2) the invading patho- 
genic hyphal form of C. albicans, matched to the non- 
invading yeast form of C. albicans. When the proteome 
images from each pair are aligned, the Proteograph™ soft- 
ware is able to rapidly identify those proteins (each refer- 
enced as having a unique molecular cluster index, or MCI) 
that are either unique, or those that are differentially ex- 
pressed. Thus, the Proteograph output from this analysis is 
both qualitative and quantitative. 

Proteograph analysis for a particular study can also be 
undertaken on any number of samples. For example, one 
might compare anything from a few to several hundred 
preparations or samples, each from a normal and disease 
counterpart, and have these analysed in a single 
Proteograph study. In this way, it is possible to assign 
strong statistical confidence to the data and in some in- 
stances to identify specific subpopulations within the input 
biological sources. This feature will become increasingly 
significant in the near future, and there is a clear synergy 
here whereby proteomics can work closely with pharma- 
cogenomic approaches to stratify patient populations and 
achieve effective targeted care for the patient. Whatever 
the source of the materials, the net output of Proteograph 
analysis is immediate identification of disease specific pro- 
teins. This is shown in Fig. 3, which shows the results of 
a proteograph obtained by comparing untreated human 
hepatoma cells with cells following exposure to a clinical 
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Figure 2. Representative proteomes obtained from (a) human serum, (b) the pathogenic fungus Candida albicans 
and (c) the human hepatoma cell line Huh 7. 
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Foregrounds: Huh7 cells treated with 5FU 

Backgrounds: Huh7 cells untreated 

■^^^^H Upregulated in Huh7 cells treated with 5FU 

with respect to untreated Huh7 cells 
^^^^^■1 Down regulated in Huh7 cells treated with 5FU 

with respect to untreated Huh7 cells 
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Figure 3* Table of differential protein expression 
profiles, referred to as a Rosetta Proteograph ™, 
between Huh 7 cells with and without the cytotoxic 
agent 5-FU. Bars are quantized and do not represent 
exact fold change values. 



cytotoxic agent. In this instance, only the top 20 differen- 
tially expressed MCIs are shown, but the readout would 
normally extend to a defined cut-off value, typically a two- 
fold or greater difference in expression levels, determined 
by the user. 

In a typical analysis involving disease and normal mam- 
malian material, in which each proteome would have 
-2000 protein features each assigned an MCI, the proteo- 
graph might identify somewhere in the region of 50-300 
MCIs that are unique or differentially expressed. To capi- 
talize rapidly on these data, at OGS a high-throughput 



mass spectrometry facility coupled to advanced databases 
to annotate these MCIs as individual proteins is applied. As 
these are all disease specific proteins, each could represent 
a novel target and/or a novel disease marker. The process 
becomes even more powerful when a panel of features, 
rather than individual features, are assigned. The relevance 
of this is apparent when one considers that most diseases, 
if not all, are multifactorial in nature and arise from poly- 
genic changes. Rather than analysing events in isolation, 
the ability to examine hundreds or thousands of events 
simultaneously, as shown by proteomics, can offer real 
advantages. 

Identification and assignment of candidate targets 
The rapid identification and assignment of candidate tar- 
gets and markers represents a huge challenge, but this has 
been greatly facilitated by combining the recent advances 
made in proteomics and analytical mass spectrometry 9 . 
Using automated procedures it is now possible to annotate 
proteins present in femtomole quantities, which would de- 
pict the low abundance class of proteins. The process of 
annotation is similarly aided by the quality and richness of 
the sequence specific databases that are currently avail- 
able, both in the public domain and in the private sector 
(e.g. those supplied by Incyte Pharmaceuticals). In this re- 
spect, the advances in proteomics have benefited consider- 
ably from the breakthroughs achieved with genomics. 

From an application perspective, cancer studies provide a 
good opportunity whereby proteomics can be instrumental 
in identifying disease specific proteins, because it is often 
feasible to obtain normal and diseased tissue from the same 
patient. For example, proteomic studies have been re- 
ported on neuroblastomas 10 , human breast proteins from 
normal and tumour sources 11-13 , lung tumours 14 , colon tu- 
mours 15 and bladder tumours 16 . There are also proteomic 
studies reported within the cardiovascular therapeutic area, 
in which disease or response proteins are identified 1718 . 

Genomic microarray analysis can similarly identify 
unique species or clusters of mRNAs that are disease spe- 
cific. However, in some instances, there is a clear lack of 
correlation between the levels of a specific mRNA and its 
corresponding protein (Ref. 19, Gypi, S.P. et ai, submit- 
ted). This has now been noted by many investigators and 
reaffirms that post-transcriptional events, including protein 
stability, protein modification (such as phosphorylation, 
glycosylation, acylation and methylation) and cell localiz- 
ation, can constitute major regulatory steps. Proteomic 
analysis captures all of these steps and can therefore pro- 
vide unique and valuable information independent from, 
or complementary to, genomic data. 
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Prote mics f r target validati n and signal transduc- 
ti n studies 

The identification of disease specific proteins alone is in- 
sufficient to begin a drug screening process. It is critical to 
assign function and validation to these proteins by con- 
firming they are indeed pivotal in the disease process. 
These studies need to encompass both gain- and loss-of- 
function analyses. This would determine whether the activity 
of a candidate target (an enzyme, for example), eliminated 
by molecular/cellular techniques, could reverse a disease 
phenotype. If this happened, then the investigator would 
have increased confidence that a small-molecule inhibitor 
against the target would also have a similar effect. The 
proposal of candidate drug targets is often not a difficult 
process, but validating them is another matter. Validation 
represents a major bottleneck where the wrong decision 
can have serious consequences 20 . 

Proteomics can be used to evaluate the role of a chosen 
target protein in signal transduction cascades directly rel- 
evant to the disease. In this manner, valuable information 
is forthcoming on the signalling pathways that are per- 
turbed by a target protein and how they might be cor- 
rected by appropriate therapeutics. Techniques that are 
well established in one-dimensional protein studies to in- 
vestigate signalling pathways, such as western blotting 
and immunoprecipitation, are highly suited to proteomic 
applications. For example, the proteomes obtained can be 
blotted onto membranes and probed with antibodies 
against the target protein or related signalling mol- 
ecules 21 " 23 . Because proteomics can resolve >2000 pro- 
teins on a single gel, it is possible to derive important 
information on specific isoforms (such as glycosylated or 
phosphorylated variants) of signalling molecules. This will 
result in characterization of how they are altered in the 
disease process. Western immunoblotting techniques 
using high-affinity antibodies will typically identify pro- 
teins present at -10 copies per cell (-1.7 fmol); this is in 
contrast to the best fluorescent dyes currently available 
that are limited to imaging proteins at 1000 or more 
copies per cell. The level of sensitivity derived by these 
applications will greatly facilitate interpretation of com- 
plex signalling pathways and contribute significantly to 
validation of the target under study. 

Immunoprecipitation studies 

Similarly, immunoprecipitation studies are another useful 
way to exploit the resolving power of proteomics 24,25 . In 
this instance, very large quantities of protein (e.g. several 
milligrams) can be subjected to incubation with antibodies 
against chosen signalling molecules. This allows high-affin- 



ity capture of these proteins, which can subsequently be 
eluted and electrophoresed on a 2D gel to provide a high- 
resolution proteome of a specific subset of proteins. 
Detection by blot analysis allows the identification of ex- 
tremely small amounts of defined signalling molecules. 
Again, the different isoforms of even very low abundance 
proteins can be seen, and, very importantly, the technique 
allows the investigator to identify multiprotein complexes 
or other proteins that co-precipitate with the target protein. 
These coassociating proteins frequently represent sig- 
nalling partners for the target protein, and their identifi- 
cation by mass spectrometry can lead to invaluable infor- 
mation on the signalling processes involved. 

The depth of signal transduction analysis offered by 
proteomics, and the utility for target validation studies, 
can be extended even further by applying cell fraction- 
ation studies 26 " 28 . By purifying subcellular fractions, such 
as membrane, nuclear, organelle and cytosolic, it is possi- 
ble to assign a localization to proteins of interest and to 
follow their trafficking in a cell. Enrichment of these frac- 
tions will also allow much higher representation of low 
abundance proteins on the proteome. Their detection by 
fluorescent dyes or immunoblot techniques will lead to 
the identification of proteins in the range of 1-10 copies 
per cell, putting the sensitivity on a par with genomic 
approaches. 

These signal transduction analyses can be of additional 
value in experiments where inhibitors derived from a 
screening programme against the target are being evalu- 
ated for their potency and selectivity. The inhibitors can 
encompass small molecules, antisense nucleic acid con- 
structs, dominant-negative proteins, or neutralizing anti- 
bodies microinjected into cells. In each case, proteome 
analysis can provide unique data in support of validation 
studies for a chosen candidate drug target. 

Proteomics and drug mode-of-action studies 

Once a validated target is committed to a screening regi- 
men to identify and advance a lead molecule, it is impor- 
tant to confirm that the efficacy of the inhibitor is through 
the expected mechanism. Such mode-of-action studies are 
usually tackled by various cell biological and biochemical 
methods. Proteomics can also be usefully applied to these 
studies and this is illustrated below by describing data ob- 
tained with OGT719. This is a novel galactosyl derivative of 
the cytotoxic agent 5-fluorouracil (5-FU), which is currently 
being developed by OGS for the treatment of hepatocel- 
lular carcinoma and colorectal metastases localized 
in the liver. The premise underpinning the design and ra- 
tionale of OGT719 was to derive a 5-FU prodrug capable 
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Figure 4. Features that are specifically up- or downregulated in Huh 7 cells by either 5-fluorouracil (5-FU) or 
OGT719: (a) elongation factor la2, (b) novel (three peptides by MS-MS) and (c) a-subunit of prolyl-4-hydroxylase. 
Arrows indicate up- or downregulated. 



of targeting, and being retained in, cells bearing the asialo- 
glycoprotein receptor (ASGP-r), including hepatocytes 29 , 
hepatoma Huh7 cells 30 and some colorectal tumour cells 31 . 
The growth of the human hepatoma cell line Huh7 is in- 
hibited by 5-FU or by OGT719- If the inhibition by 
OGT719 were the result of uptake and conversion to 5-FU 
as the active component, then it would be expected that 
Huh7 cells would show similar proteome profiles follow- 
ing exposure to either drug. 

To examine these possibilities, we conducted an experi- 
ment taking samples of Huh7 cells that had been treated 
with IC 50 doses of either OGT719 or 5-FU. Total cell lysates 
were prepared and taken through 2D electrophoresis, 
fluorescence staining, digital imaging and Proteograph 
analysis. To facilitate the interpretation of the data across 
all of the 2291 features seen on the proteomes, drug- 
induced protein changes of fivefold or greater, identified 
by the Proteograph, were analysed further. Interestingly, 
from this analysis 19 identical proteins were changed five- 
fold or more by both drugs, strongly suggesting similarities 
in the mode of action for these two compounds. 

Thus, from very complex data involving >2000 protein 
features, using proteomics it is possible to analyse quanti- 
tatively and qualitatively each protein during its exposure 
to drugs. The biologist is now able to focus a series of fur- 
ther studies specifically on an enriched subset of proteins. 



Figure 4 shows highlighted examples of the selected areas 
of the proteome where some of these identified proteins in 
the above study are altered in response to either or both 
drugs. 

Several of the proteins identified above as being modu- 
lated similarly by 5-FU or OGT719 in Huh7 cells were sub- 
jected to tandem mass-spectrometric analysis for anno- 
tation. Some of these, such as the nuclear ribosomal 
RNA-binding protein 32 , can be placed into pyrimidine 
pathways or related cell cycle/growth biochemical path- 
ways in which 5-FU is known to act. 

To attribute further significance to the proteome mode- 
of-action studies with OGT719, another cell line, the rat 
sarcoma HSN, was used. Growth of these cells is inhibited 
by 5-FU, but they are completely refractory to OGT719; 
notably they lack the ASGP-r, which might explain this 
finding (unpublished). For our proteome studies, HSN 
cells were treated with 5-FU or OGT719 over a time course 
of one, two and four days. At each time point, cells were 
harvested and processed to derive proteomes and 
Proteographs. As before, we purposely focused on those 
proteins that increased or decreased by fivefold or more. 
In this instance, there were no proteins co-modulated by 
the two drugs. This is perhaps to be expected, given that 
the HSN cells are killed by 5-FU and yet are refractory to 
OGT719. 
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Clear potential 

The above is just an example of how proteomics can be 
used to address the mode of action of anticancer drugs. 
The potential of this approach is clear, and one can envis- 
age situations where it will be profitable to compare the 
proteomes of cells in which the drug target has been elimi- 
nated by molecular knockout techniques, or with small- 
molecule inhibitors believed to act specifically on the same 
target. In addition to using proteomics to examine the ac- 
tion of drugs, it is also possible to use this approach to 
gauge the extent of nonspecific effects that might eventu- 
ally lead to toxicity. For instance, in the example used 
above with HSN cells treated with OGT719, although cell 
growth was not affected, the levels of several specific pro- 
teins were changed. Further investigation of these proteins 
and the signalling pathways in which they are involved 
could be illuminating in predicting the likelihood or other- 
wise of long-term toxicity. 

Use of proteomics in formal drug 
t xicology studies 

A drug discovery programme at the stage where leads 
have been identified and mode-of-action studies are ad- 
vanced, will proceed to investigate the pharmacokinetic 
and toxicology profile of those agents. These two param- 
eters are of major importance in the drug discovery 
process, and many agents that have looked highly promis- 
ing from in vitro studies have subsequently failed because 
of insurmountable pharmacokinetic and/or toxicity prob- 
lems in vivo. Whereas the pharmacokinetic properties of a 
molecule can now be characterized quickly and accu- 
rately, toxicity studies are typically much longer and more 
demanding in their interpretation. 

The ability to achieve fast and accurate predictions of 
toxicity within an in vivo setting would represent a big 
step forward in accelerating any drug discovery pro- 
gramme. Toxicity from a drug can be manifested in any 
organ. However, because the liver and kidney are the 
major sites in the body responsible for metabolism and 
elimination of most drugs, it is informative to examine 
these particular organs in detail to provide early indi- 
cations about events that might result in toxicity. 

The basis for most xenobiotic metabolizing activity is to 
increase the hydrophilicity of the compound and so facili- 
tate its removal from the body. Most drugs are metabo- 
lized in the liver via the cytochrome P450 family of en- 
zymes, which are known to comprise a total of -200 
different members 3334 , encompassing a wide array of 
overlapping specificities for different substrates. In addi- 
tion to clearance, they also play a major role in metabo- 



lism that can lead to the production and removal of toxic 
species, and in some instances it is possible to correlate 
the ability or failure to remove such a toxin with a specific 
P450 or subgroup. 

Unique P450 profiles 

Each individual person will have a slightly different P450 
profile, largely from polymorphisms and changes in ex- 
pression levels, although other genetic and environmental 
factors aside from P450 also need to be taken into consid- 
eration. A significant amount of research is currently 
being directed towards this field - known as pharmacoge- 
nomics - with the aim of predicting how a patient will re- 
spond to a drug, as determined by their genetic make- 
up 35-37 . The marked variation of individuals in their ability 
to clear a compound can be one of the key factors in de- 
ciding the overall pharmacokinetic profile of a drug. Not 
only will this have a bearing on the likelihood of a patient 
responding to a treatment, but it will also be a factor in 
determining the possibility of their experiencing an ad- 
verse effect. 

Many pharmaceutical companies are already employing 
genomic approaches, involving P450 measurements, as a 
key step in their assessment of the toxicological profile of 
a candidate drug and therefore of its suitability, or other- 
wise, to be considered for human clinical trials. There are 
limits to this approach, however. Whereas the P450 rnRNA 
profiling can predict with some accuracy the likely meta- 
bolic fate of a drug, it will not provide information on 
whether the metabolites would subsequently lead to tox- 
icity. Besides the patient-to-patient differences in steady- 
state levels of the P450s, there are also characteristic induc- 
tion responses of these enzymes to some drugs. Moreover, 
as there can be some doubt over the correlation of mRNA 
levels and the corresponding protein levels, there is scope 
for misinterpretation of the results and hence real advan- 
tages to be gained from a proteome approach. In both in- 
stances, the ability to examine entire proteome profiles, in- 
cluding the P450 proteins, will be a significant advantage 
in understanding and predicting the metabolism and 
toxicological outcome of drugs. 

In addition to direct organ and tissue studies, the serum, 
which collects the majority of toxicity markers released 
from susceptible organs and tissues throughout the entire 
body, can be utilized. Serum is rich in nuclease activity 
and, as pharmacogenomics is not suited to deal with these 
samples, valuable markers of toxicity could go undetected. 
However, by using proteomics for these types of analyses, 
serum markers (and clusters thereoO are now accessible 
for evaluation as indicators of toxicity. 
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Pharmacoproteomics 

Proteomics can thus be used to add a new sphere of 
analysis to the study of toxicity at the protein level, and in 
the era of -omics' there is a case to be made to adopt the 
term 'Pharmacoproteomics™'. Animals can be dosed with 
increasing levels of an experimental drug over time, and 
serum samples can be drawn for consecutive proteome 
analyses. Using this procedure, it should be possible to 
identify individual markers, or clusters thereof, that are 
dose related and correlate with the emergence and severity 
of toxicity. Markers might appear in the serum at a defined 
drug dose and time that are predictive of early toxicity 
within certain organs and if allowed to continue will have 
damaging consequences. These serum markers could sub- 
sequently be used to predict the response of each individ- 
ual and allow tailoring of therapy whereby optimal effi- 
cacy is achieved without adverse side effects being 
apparent. This application can obviously extend to track- 
ing toxicity of drugs in clinical trials where serum can be 
readily drawn and analysed. Surrogate markers for drug ef- 
ficacy could also be detected by this procedure and could 
facilitate the challenge of identifying patient classes who 
will respond favourably to a drug and at what dosage. 

C nclusions 

By contrast to the agents administered to patients in clini- 
cal wards, the process of drug discovery is not a prescrip- 
tive series of steps. The risks are high and there are long 
timelines to be endured before it is known whether a can- 
didate drug will succeed or fail. At each step of the drug 
discovery process there is often scope for flexibility in in- 
terpretation, which over many steps is cumulative. The 
pharmaceutical companies most likely to succeed in this 
environment are those that are able to make informed 
accurate decisions within an accelerated process. 

The genomics revolution has impacted very positively 
upon these issues and now has a powerful new partner in 
proteomics. The ability to undertake global analysis of pro- 
teins from a very wide diversity of biological systems and 
to interrogate these in a high-throughput, systematic man- 
ner will add a significant new dimension to drug discov- 
ery. Each step of the process from target discovery to clini- 
cal trials is accessible to proteomics, often providing 
unique sets of data. Using the combination of genomics 
and proteomics, scientists can now see every dimension of 
their biological focus, from genes, mRNA, proteins and 
their subcellular localization. This will greatly assist our 
understanding of the fundamental mechanistic basis of 
human disease and allow new improved and speedier 
drug discovery strategies to be implemented. 
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ABSTRACT Pairwise sequence comparison methods have 
been assessed using proteins whose relationships are known 
reliably from their structures and functions, as described in 
the scop database [Murzin, A. G., Brenner, S. E., Hubbard, T. 
& Chothia C. (1995) /. MoL Biol. 247, 536-540]. The evalua- 
tion tested the programs blast [Altschul, S. F., Gish, W., 
Miller, W., Myers, E. W. & Lipman, D. J. (1990)./. MoL Biol. 
215, 403-410], WU-BLAST2 [Altschul, S. F. & Gish, W. (1996) 
Methods Enzymol. 266, 460-480], FASTA [Pearson, W. R. & 
Lipman, D. J. (1988) Proc. Natl. Acad. Sci. USA 85, 2444-2448] , 
and SSEARCH [Smith, T. F. & Waterman, M. S. (1981) J. Mol. 
Biol. 147 r , 195-197] and their scoring schemes. The error rate 
of all algorithms is greatly reduced by using statistical scores 
to evaluate matches rather than percentage identity or raw 
scores. The E-value statistical scores of SSEARCH and FASTA are 
reliable: the number of false positives found in our tests agrees 
well with the scores reported. However, the P-values reported 
by BLAST and WU-BLAST2 exaggerate significance by orders of 
magnitude, ssearch, fasta ktup = 1, and WU-BLAST2 perform 
best, and they are capable of detecting almost all relationships 
between proteins whose sequence identities are >30%. For 
more distantly related proteins, they do much less well; only 
one-half of the relationships between proteins with 20-30% 
identity are found. Because many homologs have low sequence 
similarity, most distant relationships cannot be detected by 
any pairwise comparison method; however, those which are 
identified may be used with confidence. 



Sequence database searching plays a role in virtually every 
branch of molecular biology and is crucial for interpreting the 
sequences issuing forth from genome projects. Given the 
method's central role, it is surprising that overall and relative 
capabilities of different procedures are largely unknown. It is 
difficult to verify algorithms on sample data because this 
requires large data sets of proteins whose evolutionary rela- 
tionships are known unambiguously and independently of the 
methods being evaluated. However, nearly all known ho- 
mologs have been identified by sequence analysis (the method 
to be tested). Also, it is generally very difficult to know, in the 
absence of structural data, whether two proteins that lack clear 
sequence similarity are unrelated. This has meant that al- 
though previous evaluations have helped improve sequence 
comparison, they have suffered from insufficient, imperfectly 
characterized, or artificial test data. Assessment also has been 
problematic because high quality database sequence searching 
attempts to have both sensitivity (detection of homologs) and 
specificity (rejection of unrelated proteins); however, these 
complementary goals are linked such that increasing one 
causes the other to be reduced. 
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Sequence comparison methodologies have evolved rapidly, 
so no previously published tests has evaluated modern versions 
of programs commonly used. For example, parameters in 
blast (1) have changed, and WU-BLAST2 (2) — which produces 
gapped alignments — has become available. The latest version 
of FASTA (3) previously tested was 1.6, but the current release 
(version 3.0) provides fundamentally different results in the 
form of statistical scoring. 

The previous reports also have left gaps in our knowledge. 
For example, there has been no published assessment of 
thresholds for scoring schemes more sophisticated than per- 
centage identity. Thus, the widely discussed statistical scoring 
measures have never actually been evaluated on large data- 
bases of real proteins. Moreover, the different scoring schemes 
commonly in use have not been compared. 

Beyond these issues, there is a more fundamental question: 
in an absolute sense, how well does pairwise sequence com- 
parison work? That is, what fraction of homologous proteins 
can be detected using modern database searching methods? 

In this work, we attempt to answer these questions and to 
overcome both of the fundamental difficulties that have hin- 
dered assessment of sequence comparison methodologies. 
First, we use the set of distant evolutionary relationships in the 
SCOP: Structural Classification of Proteins database (4), which 
is derived from structural and functional characteristics (5). 
The SCOP database provides a uniquely reliable set of ho- 
mologs, which are known independently of sequence compar- 
ison. Second, we use an assessment method that jointly mea- 
sures both sensitivity and specificity. This method allows 
straightforward comparison of different sequence searching 
procedures. Further, it can be used to aid interpretation of real 
database searches and thus provide optimal and reliable 
results. 

Previous Assessments of Sequence Comparison. Several 
previous studies have examined the relative performance of 
different sequence comparison methods. The most encom- 
passing analyses have been by Pearson (6, 7), who compared 
the three most commonly used programs. Of these, the Smith- 
Waterman algorithm (8) implemented in SSEARCH (3) is the 
oldest and slowest but the most rigorous. Modern heuristics 
have provided blast (1) the speed and convenience to make 
it the most popular program. Intermediate between these two 
is FASTA (3), which may be run in two modes offering either 
greater speed (ktup = 2) or greater effectiveness (ktup = 1). 
Pearson also considered different parameters for each of these 
programs. 

To test the methods, Pearson selected two representative 
proteins from each of 67 protein superfamilies defined by the 
pir database (9). Each was used as a query to search the 
database, and the matched proteins were marked as being 
homologous or unrelated according to their membership of PIR 



Abbreviation: EPQ, errors per query. 

t Present address: Department of Structural Biology, Stanford Uni- 
versity, Fairchild Building D-109, Stanford, CA 94305-5126 

*To whom reprints requests should be addressed, e-mail: brenner@ 
hyper.stanford.edu. 



6073 



6074 Biochemistry: Brenner et al 

superfamilies. Pearson found that modern matrices and "In- 
scaling" of raw scores improve results considerably. He also 
reported that the rigorous Smith-Waterman algorithm worked 
slightly better than fasta, which was in turn more effective 
than BLAST. 

Very large scale analyses of matrices have been performed 
(10), and Henikoff and Henikoff (11) also evaluated the 
effectiveness of blast and fasta. Their test with blast 
considered the ability to detect homologs above a predeter- 
mined score but had no penalty for methods which also 
reported large numbers of spurious matches. The Henikoffs 
searched the SWiss-PROT database (12) and used PROSITE (13) 
to define homologous families. Their results showed that the 
BLOSUM62 matrix (14) performed markedly better than the 
extrapolated PAM-series matrices (15), which previously had 
been popular. 

A crucial aspect of any assessment is the data that are used 
to test the ability of the program to find homologs. But in 
Pearson's and the Henikoffs' evaluations of sequence com- 
parison, the correct results were effectively unknown. This is 
because the superfamilies in pir and PROSITE are principally 
created by using the same sequence comparison methods 
which are being evaluated. Interdependency of data and 
methods creates a "chicken and egg" problem, and means for 
example, that new methods would be penalized for correctly 
identifying homologs missed by older programs. For instance, 
immunoglobulin variable and constant domains are clearly 
homologous, but PIR places them in different superfamilies. 
The problem is widespread: each superfamily in PIR 48.00 with 
a structural homolog is itself homologous to an average of 1.6 
other PIR superfamilies (16). 

To surmount these sorts of difficulties, Sander and Schnei- 
der (17) used protein structures to evaluate sequence com- 
parison. Rather than comparing different sequence compari- 
son algorithms, their work focused on determining a length- 
dependent threshold of percentage identity, above which all 
proteins would be of similar structure. A result of this analysis 
was the hssp equation; it states that proteins with 25% identity 
over 80 residues will have similar structures, whereas shorter 
alignments require higher identity. (Other studies also have 
used structures (18-20), but these focused on a small number 
of model proteins and were principally oriented toward eval- 
uating alignment accuracy rather than homology detection.) 

A general solution to the problem of scoring comes from 
statistical measures (i.e., E-values and P-values) based on the 
extreme value distribution (21). Extreme value scoring was 
implemented analytically in the BLAST program using the 
Karlin and Altschul statistics (22, 23) and empirical ap- 
proaches have been recently added to fasta and SSEARCH. In 
addition to being heralded as a reliable means of recognizing 
significantly similar proteins (24, 25), the mathematical trac- 
tability of statistical scores "is a crucial feature of the BLAST 
algorithm" (1). The validity of this scoring procedure has been 
tested analytically and empirically (see ref. 2 and references in 
ref. 24). However, all large empirical tests used random 
sequences that may lack the subtle structure found within 
biological sequences (26, 27) and obviously do not contain any 
real homologs. Thus, although many researchers have sug- 
gested that statistical scores be used to rank matches (24, 25, 
28), there have been no large rigorous experiments on biolog- 
ical data to determine the degree to which such rankings are 
superior. 

A Database for Testing Homology Detection. Since the 
discovery that the structures of hemoglobin and myoglobin are 
very similar though their sequences are not (29), it has been 
apparent that comparing structures is a more powerful (if less 
convenient) way to recognize distant evolutionary relation- 
ships than comparing sequences. If two proteins show a high 
degree of similarity in their structural details and function, it 
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is very probable that they have an evolutionary relationship 
though their sequence similarity may be low. 

The recent growth of protein structure information com- 
bined with the comprehensive evolutionary classification in 
the scop database (4, 5) have allowed us to overcome previous 
limitations. With these data, we can evaluate the performance 
of sequence comparison methods on real protein sequences 
whose relationships are known confidently. The SCOP database 
uses structural information to recognize distant homologs, the 
large majority of which can be determined unambiguously. 
These superfamilies, such as the globins or the immunoglobu- 
lins, would be recognized as related by the vast majority of the 
biological community despite the lack of high sequence sim- 
ilarity. 

From scop, we extracted the sequences of domains of 
proteins in the Protein Data Bank (pdb) (30) and created two 
databases. One (PDB90D-B) has domains, which were all <90% 
identical to any other, whereas (PDB40D-B) had those <40% 
identical. The databases were created by first sorting all 
protein domains in SCOP by their quality and making a list. The 
highest quality domain was selected for inclusion in the 
database and removed from the list. Also removed from the list 
(and discarded) were all other domains above the threshold 
level of identity to the selected domain. This process was 
repeated until the list was empty. The PDB40D-B database 
contains 1,323 domains, which have 9,044 ordered pairs of 
distant relationships, or «*0.5% of the total 1,749,006 ordered 
pairs. In PDB90D-B, the 2,079 domains have 53,988 relation- 
ships, representing 1.2% of all pairs. Low complexity regions 
of sequence can achieve spurious high scores, so these were 
masked in both databases by processing with the SEG program 
(27) using recommended parameters: 12 1.8 2.0. The databases 
used in this paper are available from http://sss.stanford.edu/ 
sss/, and databases derived from the current version of scop 
may be found at http://scop.mrc-lmb.cam.ac.uk/scop/. 

Analyses from both databases were generally consistent, but 
PDB40D-B focuses on distantly related proteins and reduces the 
heavy overrepresentation in the pdb of a small number of 
families (31, 32), whereas PDB90D-B (with more sequences) 
improves evaluations of statistics. Except where noted other- 
wise, the distant homolog results here are from PDB40D-B. 
Although the precise numbers reported here are specific to the 
structural domain databases used, we expect the trends to be 
general. 

Assessment Data and Procedure. Our assessment of se- 
quence comparison may be divided into four different major 
categories of tests. First, using just a single sequence compar- 
ison algorithm at a time, we evaluated the effectiveness of 
different scoring schemes. Second, we assessed the reliability 
of scoring procedures, including an evaluation of the validity 
of statistical scoring. Third, we compared sequence compari- 
son algorithms (using the optimal scoring scheme) to deter- 
mine their relative performance. Fourth, we examined the 
distribution of homologs and considered the power of pairwise 
sequence comparison to recognize them. All of the analyses 
used the databases of structurally identified homologs and a 
new assessment criterion. 

The analyses tested BLAST (1), version 1.4.9MP, and wu- 
BLAST2 (2), version 2.0al3MP. Also assessed was the FASTA 
package, version 3.0t76 (3), which provided fasta and the 
SSEARCH implementation of Smith-Waterman (8). For 
SSEARCH and fasta, we used BLOSUM45 with gap penalties 
-12/-1 (7, 16). The default parameters and matrix (BLO- 
SUM62) were used for blast and WU-BLAST2. 

The "Coverage Vs. Error" Plot. To test a particular protocol 
(comprising a program and scoring scheme), each sequence 
from the database was used as a query to search the database. 
This yielded ordered pairs of query and target sequences with 
associated scores, which were sorted, on the basis of their 
scores, from best to worst. The ideal method would have 
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Fig. 1. Coverage vs. error plots of different scoring schemes for ssearch Smith-Waterman. (A) Analysis of PDB40D-B database. (B) Analysis 
of PDB90D-B database. All of the proteins in the database were compared with each other using the ssearch program. The results of this single 
set of comparisons were considered using five different scoring schemes and assessed. The graphs show the coverage and errors per query (EPQ) 
for statistical scores, raw scores, and three measures using percentage identity. In the coverage vs. error plot, the x axis indicates the fraction of 
all homologs in the database (known from structure) which have been detected. Precisely, it is the number of detected pairs of proteins with the 
same fold divided by the total number of pairs from a common superfamily. PDB40D-B contains a total of 9,044 homologs, so a score of 10% indicates 
identification of 904 relationships. The y axis reports the number of EPQ. Because there are 1,323 queries made in the PDB40D-B all-vs.-all 
comparison, 13 errors corresponds to 0.01, or 1% EPQ. They axis is presented on a log scale to show results over the widely varying degrees of 
accuracy which may be desired. The scores that correspond to the levels of EPQ and coverage are shown in Fig. 4 and Table 1. The graph 
demonstrates the trade-off between sensitivity and selectivity. As more homologs are found (moving to the right), more errors are made (moving 
up). The ideal method would be in the lower right corner of the graph, which corresponds to identifying many evolutionary relationships without 
selecting unrelated proteins. Three measures of percentage identity are plotted. Percentage identity within alignment is the degree of identity within 
the aligned region of the proteins, without consideration of the alignment length. Percentage identity within both is the number of identical residues 
in the aligned region as a percentage of the average length of the query and target proteins. The hssp equation (17) is H — 290.15/ -0,562 where 
/ is length for 10 < / < 80; H > 100 for / < 10; H = 24.7 for / > 80. The percentage identity HSSP-adjusted score is the percent identity within 
the alignment minus H. Smith-Waterman raw scores and E-values were taken directly from the sequence comparison program. 



perfect separation, with all of the homologs at the top of the 
list and unrelated proteins below. In practice, perfect separa- 
tion is impossible to achieve so instead one is interested in 
drawing a threshold above which there are the largest number 
of related pairs of sequences consistent with an acceptable 
error rate. 

Our procedure involved measuring the coverage and error 
for every threshold. Coverage was defined as the fraction of 
structurally determined homologs that have scores above the 
selected threshold; this reflects the sensitivity of a method. 
Errors per query (EPQ), an indicator of selectivity, is the 
number of nonhomologous pairs above the threshold divided 
by the number of queries. Graphs of these data, called 
coverage vs. error plots, were devised to understand how 



protocols compare at different levels of accuracy. These 
graphs share effectively all of the beneficial features of Re- 
ciever Operating Characteristic (ROC) plots (33, 34) but 
better represent the high degrees of accuracy required in 
sequence comparison and the huge background of nonho- 
mologs. 

This assessment procedure is directly relevant to practical 
sequence database searching, for it provides precisely the 
information necessary to perform a reliable sequence database 
search. The EPQ measure places a premium on score consis- 
tency; that is, it requires scores to be comparable for different 
queries. Consistency is an aspect which has been largely 

Percent Identity of Unrelated Proteins (PDB90D-B) 
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Fig. 2. Unrelated proteins with high percentage identity. Hemo- 
globin 0-chain (pdb code lhds chain b, ref. 38, Left) and cellulase E2 
(pdb code Itml, ref. 39, Right) have 39% identity over 64 residues, a 
level which is often believed to be indicative of homology. Despite this 
high degree of identity, their structures strongly suggest that these 
proteins are not related. Appropriately, neither the raw alignment 
score of 85 nor the E-value of 1.3 is significant. Proteins rendered by 
RASMOL (40). 
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Fig. 3. Length and percentage identity of alignments of unrelated 
proteins in PDB90D-B: Each pair of nonhomologous proteins found with 
ssearch is plotted as a point whose position indicates the length and 
the percentage identity within the alignment. Because alignment 
length and percentage identity are quantized, many pairs of proteins 
may have exactly the same alignment length and percentage identity. 
The line shows the hssp threshold (though it is intended to be applied 
with a different matrix and parameters). 
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Reliability of Statistical Scores (PDB90D-B) 
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Fig. 4. Reliability of statistical scores in PDB90D-B: Each line shows 
the relationship between reported statistical score and actual error 
rate for a different program. E-values are reported for ssearch and 
fast A, whereas P-values are shown for blast and wu-blastz. If the 
scoring were perfect, then the number of errors per query and the 
E-values would be the same, as indicated by the upper bold line. 
(P-values should be the same as EPQ for small numbers, and diverges 
at higher values, as indicated by the lower bold line.) E-values from 
ssearch and fasta are shown to have good agreement with EPQ but 
underestimate the significance slightly, blast and wu-blast2 are 
overconfident, with the degree of exaggeration dependent upon the 
score. The results for PDB40D-B were similar to those for pdbwd-b 
despite the difference in number of homologs detected. This graph 
could be used to roughly calibrate the reliability of a given statistical 
score. 

ignored in previous tests but is essential for the straightforward 
or automatic interpretation of sequence comparison results. 
Further, it provides a clear indication of the confidence that 
should be ascribed to each match. Indeed, the EPQ measure 
should approximate the expectation value reported by data- 
base searching programs, if the programs' estimates are accu- 
rate. 

The Performance of Scoring Schemes. All of the programs 
tested could provide three fundamental types of scores. The 
first score is the percentage identity, which may be computed 
in several ways based on either the length of the alignment or 
the lengths of the sequences. The second is a "raw" or 
"Smith-Waterman" score, which is the measure optimized by 
the Smith-Waterman algorithm and is computed by summing 
the substitution matrix scores for each position in the align- 
ment and subtracting gap penalties. In blast, a measure 
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related to this score is scaled into bits. Third is a statistical 
score based on the extreme value distribution. These results 
are summarized in Fig. 1. 

Sequence Identity. Though it has been long established that 
percentage identity is a poor measure (35), there is a common 
rule-of-thumb stating that 30% identity signifies homology. 
Moreover, publications have indicated that 25% identity can 
be used as a threshold (17, 36). We find that these thresholds, 
originally derived years ago, are not supported by present 
results. As databases have grown, so have the possibilities for 
chance alignments with high identity; thus, the reported cutoffs 
lead to frequent errors. Fig. 2 shows one of the many pairs of 
proteins with very different structures that nonetheless have 
high levels of identity over considerable aligned regions. 
Despite the high identity, the raw and the statistical scores for 
such incorrect matches are typically not significant. The prin- 
cipal reasons percentage identity does so poorly seem to be 
that it ignores information about gaps and about the conser- 
vative or radical nature of residue substitutions. 

From the PDB90D-B analysis in Fig. 3, we learn that 30% 
identity is a reliable threshold for this database only for 
sequence alignments of at least 150 residues. Because one 
unrelated pair of proteins has 43.5% identity over 62 residues, 
it is probably necessary for alignments to be at least 70 residues 
in length before 40% is a reasonable threshold, for a database 
of this particular size and composition. 

At a given reliability, scores based on percentage identity 
detect just a fraction of the distant homologs found by 
statistical scoring. If one measures the percentage identity in 
the aligned regions without consideration of alignment length, 
then a negligible number of distant homologs are detected. 
Use of the hssp equation improves the value of percentage 
identity, but even this measure can find only 4% of all known 
homologs at 1% EPQ. In short, percentage identity discards 
most of the information measured in a sequence comparison. 

Raw Scores. Smith- Water man raw scores perform better 
than percentage identity (Fig. 1), but ln-scaling (7) provided no 
notable benefit in our analysis. It is necessary to be very precise 
when using either raw or bit scores because a 20% change in 
cutoff score could yield a tenfold difference in EPQ. However, 
it is difficult to choose appropriate thresholds because the 
reliability of a bit score depends on the lengths of the proteins 
matched and the size of the database. Raw score thresholds 
also are affected by matrix and gap parameters. 

Statistical Scores. Statistical scores were introduced partly 
to overcome the problems that arise from raw scores. This 
scoring scheme provides the best discrimination between 
homologous proteins and those which are unrelated. Most 
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Fig. 5. Coverage vs. error plots of different sequence comparison methods: Five different sequence comparison methods are evaluated, each 
using statistical scores (E- or P-values). (A ) PDB40D-B database. In this analysis, the best method is the slow ssearch, which finds 18% of relationships 
at 1% EPQ. fasta ktup - 1 and wu-blast2 are almost as good. (B) PDB90D-B database. The quick wu-blast2 program provides the best coverage 
at 1% EPQ on this database, although at higher levels of error it becomes slightly worse than fasta ktup = 1 and ssearch. 
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likely, its power can be attributed to its incorporation of more 
information than any other measure; it takes account of the 
full substitution and gap data (like raw scores) but also has 
details about the sequence lengths and composition and is 
scaled appropriately. 

We find that statistical scores are not only powerful, but also 
easy to interpret, ssearch and fasta show close agreement 
between statistical scores and actual number of errors per 
query (Fig. 4). The expectation value score gives a good, 
slightly conservative estimate of the chances of the two se- 
quences being found at random in a given query. Thus, an 
E-value of 0.01 indicates that roughly one pair of nonhomologs 
of this similarity should be found in every 100 different queries. 
Neither raw scores nor percentage identity can be interpreted 
in this way, and these results validate the suitability of the 
extreme value distribution for describing the scores from a 
database search. 

The P-values from blast also should be directly interpret- 
able but were found to overstate significance by more than two 
orders of magnitude for 1% EPQ for this database. Nonethe- 
less, these results strongly suggest that the analytic theory is 
fundamentally appropriate. WU-BLAST2 scores were more re- 
liable than those from blast, but also exaggerate expected 
confidence by more than an order of magnitude at 1% EPQ. 

Overall Detection of Homologs and Comparison of Algo- 
rithms. The results in Fig. 5A and Table 1 show that pairwise 
sequence comparison is capable of identifying only a small 
fraction of the homologous pairs of sequences in PDB40D-B. 
Even ssearch with E-values, the best protocol tested, could 
find only 18% of all relationships at a 1% EPQ. BLAST, which 
identifies 15%, was the worst performer, whereas fasta 
ktup = 1 is nearly as effective as ssearch. fasta ktup = 2 and 
WU-BLAST2 are intermediate in their ability to detect ho- 
mologs. Comparison of different algorithms indicates that 
those capable of identifying more homologs are generally 
slower. SSEARCH is 25 times slower than BLAST and 6.5 times 
slower than FASTA ktup = 1. WU-BLAST2 is slightly faster than 
fasta ktup = 2, but the latter has more interpret able scores. 

In PDB90D-B, where there are many close relationships, the 
best method can identify only 38% of structurally known 
homologs (Fig. 5B). The method which finds that many 
relationships is WU-BLAST2. Consequently, we infer that the 
differences between fasta kup = 1, ssearch, and WU-BLAST2 
programs are unlikely to be significant when compared with 
variation in database composition and scoring reliability. 

Fig. 6 helps to explain why most distant homologs cannot be 
found by sequence comparison: a great many such relation- 
ships have no more sequence identity than would be expected 
by chance. SSEARCH with E-values can recognize >90% of the 
homologous pairs with 30-40% identity. In this region, there 
are 30 pairs of homologous proteins that do not have signif- 
icant E-values, but 26 of these involve sequences with <50 
residues., Of sequences having 25-30% identity, 75% are 
identified by SSEARCH E-values. However, although the num- 
ber of homologs grows at lower levels of identity, the detection 
falls off sharply: only 40% of homologs with 20-25% identity 
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Fig. 6. Distribution and detection of homologs in PDB40D-B. Bars 
show the distribution of homologous pairs PDB40D-B according to their 
identity (using the measure of identity in both). Filled regions indicate 
the number of these pairs found by the best database searching method 
(ssearch with E-values) at 1% EPQ. The PDB40D-B database contains 
proteins with <40% identity, and as shown on this graph, most 
structurally identified homologs in the database have diverged ex- 
tremely far in sequence and have <20% identity. Note that the 
alignments maybe inaccurate, especially at low levels of identity. Filled 
regions show that ssearch can identify most relationships that have 
25% or more identity, but its detection wanes sharply below 25%. 
Consequently, the great sequence divergence of most structurally 
identified evolutionary relationships effectively defeats the ability of 
pariwise sequence comparison to detect them, 

are detected and only 10% of those with 15-20% can be found. 
These results show that statistical scores can find related 
proteins whose identity is remarkably low; however, the power 
of the method is restricted by the great divergence of many 
protein sequences. 

After completion of this work, a new version of pairwise 
BLAST was released: BLASTGP (37). It supports gapped align- 
ments, like WU-BLAST2, and dispenses with sum statistics. Our 
initial tests on blastgp using default parameters show that its 
E-values are reliable and that its overall detection of homologs 
was substantially better than that of ungapped blast, but not 
quite equal to that of WU-BLAST2. 

CONCLUSION 

The general consensus amongst experts (see refs. 7, 24, 25, 27 
and references therein) suggests that the most effective se- 
quence searches are made by (/) using a large current database 
in which the protein sequences have been complexity masked 
and (u) using statistical scores to interpret the results. Our 
experiments fully support this view. 

Our results also suggest two further points. First, the E-val- 
ues reported by fasta and ssearch give fairly accurate 
estimates of the significance of each match, but the P-values 
provided by BLAST and WU-BLAST2 underestimate the true 



Table 1. Summary of sequence comparison methods with PDB40D-B 



Method 


Relative Time* 


1% EPQ Cutoff 


Coverage at 1% EPQ 


ssearch % identity: within alignment 


25.5 


>70% 


<0.1 


ssearch % identity: within both 


25.5 


34% 


3.0 


ssearch % identity: HSSP-scaled 


25.5 


35% (hssp + 9.8) 


4.0 


ssearch Smith- Waterman raw scores 


25.5 


142 


10.5 


ssearch E-values 


25.5 


0.03 


18.4 


fasta ktup = 1 E-values 


3.9 


0.03 


17.9 


fasta ktup = 2 E-values 


1.4 


0,03 


16.7 


WU-BLAST2 P-values 


1.1 


0.003 


17.5 


blast P-values 


1.0 


0.00016 


14.8 


*Times are from large database searches with genome proteins. 
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extent of errors. Second, ssearch, wu-blast2, and fasta 
ktup = 1 perform best, though BLAST and FASTA ktup = 2 
detect most of the relationships found by the best procedures 
and are appropriate for rapid initial searches. 

The homologous proteins that are found by sequence com- 
parison can be distinguished with high reliability from the huge 
number of unrelated pairs. However, even the best database 
searching procedures tested fail to find the large majority of 
distant evolutionary relationships at an acceptable error rate. 
Thus, if the procedures assessed here fail to find a reliable 
match, it does not imply that the sequence is unique; rather, it 
indicates that any relatives it might have are distant ones.** 



** Additional and updated information about this work, including 
supplementary figures, may be found at http://sss.stanford.edu/sss/. 
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Annotation Transfer for Genomics: Measuring 
Functional Divergence in Multi-Domain Proteins 
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Annotation transfer is a principal process in genome annotation. It involves "transferring" structural and 
functional annotation to uncharacterized open reading frames (ORFs) in a newly completed genome from 
experimentally characterized proteins similar in sequence. To prevent errors in genome annotation, it is 
important that this process be robust and statistically well-characterized, especially with regard to how it 
depends on the degree of sequence similarity. Previously, we and others have analyzed annotation transfer in 
single-domain proteins. Multi-domain proteins, which make up the bulk of the ORFs in eukaryotic genomes, 
present more complex issues in functional conservation. Here we present a large-scale survey of annotation 
transfer in these proteins, using scop superfamilies to define domain folds and a thesaurus based on 
SWISS-PROT keywords to define functional categories. Our survey reveals that multi-domain proteins have 
significantly less functional conservation than single-domain ones, except when they share the exact same 
combination of domain folds. In particular, we find that for multi-domain proteins, approximate function can be 
accurately transferred with only 35% certainty for pairs of proteins sharing one structural superfamily. In 
contrast, this value is 67% for pairs of single-domain proteins sharing the same structural superfamily. On the 
other hand, if two multi-domain proteins contain the same combination of two structural superfamilies the 
probability of their sharing the same function increases to 80% in the case of complete coverage along the full 
length of both proteins, this value increases further to > 90%. Moreover, we found that only 70 of the current 
total of 455 structural superfamilies are found in both single and multi-domain proteins and only 14 of these 
were associated with the same function in both categories of proteins. We also investigated the degree to which 
function could be transferred between pairs of multi-domain proteins with respect to the degree of sequence 
similarity between them, finding that functional divergence at a given amount of sequence similarity is always 
about two-fold greater for pairs of multi-domain proteins (sharing similarity over a single domain) in 
comparison to pairs of single-domain ones, though the overall shape of the relationship is quite similar. Further 
information is available at http://partslist.org/func or http://bioinfo.mbb.yale.edu/partsIist/func. 



The ultimate goal of the genome projects is to determine the 
structure and function of all the newly identified gene prod- 
ucts. Fundamentally, this will be carried out via annotation 
transfer, transferring the structural and functional annotation 
from an experimentally characterized protein (as in a model 
organism such as Escherichia coli) to a predicted protein in a 
newly sequenced genome that shares similarity in sequence. 
The degree of annotation transferred will depend on the de- 
gree of sequence similarity. This process is shown schemati- 
cally in Figure 1. In this paper, we aim to address this major 
question in bioinformatics, specifically focusing on multi- 
domain proteins, as they make up the bulk of the proteome in 
eukaryotic organisms (Gerstein 1998). 

Our work is a direct outgrowth of two previous analyses 
of ours that concentrated on single-domain proteins. In an 
earlier paper, we found that the different structural classes of 
the scop classification system have different propensities to 
carry out certain types of function (Hegyi and Gerstein 1999). 
In particular, while the alpha/beta folds were disproportion- 
ately associated with enzymes and all-alpha and small folds 
with non-enzymes, the alpha + beta structures had an equal 
tendency for both enzymatic and non-enzymatic functions. 

'Corresponding author. 
E-MAIL Mark.Cersteln@yale.edu 

Article and publication are at http://www.genome.org/cgi/doi/10.1101/ 
gr. 183801. 



Wilson et al. (2000) compared a large number of protein do- 
mains to one another in a pair-wise fashion with respect to 
similarities in sequence, structure, and function. Using a hy- 
brid functional classification scheme merging the ENZYME 
and FlyBase systems (Gelbart et al. 1997; Bairoch 2000), they 
found that precise function is not conserved below 30-40% 
identity, although the broad functional class is usually pre- 
served for sequence identities as low as 20-25%, given that 
the sequences have the same fold. Their survey also reinforced 
the previously established general exponential relationship 
between structural and sequence similarity (Chothia and Lesk 
1986). 

Other Work on Establishing Relationships between 
Sequence, Structure, and Function 

Several other groups have studied the relationship between 
sequence, structure, and function in detail, attempting to de- 
termine the extent to which functional transference between 
matching proteins is feasible (Shah and Hunger 1997; Martin 
et al. 1998; Thornton et al. 1999, 2000; Zhang et al. 1999; 
Shapiro and Harris 2000; Todd et al. 2001). Orengo et al. 
(1999) analyzed protein families in the CATH database and 
concluded that > 96% of the folds in the PDB are associated 
with a single homologous family. By investigating enzymatic 
folds they also found that more than 95% of homologous 
families show either single or closely related functions. 
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Figure 1 Schematic illustrating annotation transfer. This figure illustrates the process of annotation transfer for a group of hypothetical TIM barrel 
proteins. The leftmost panel represents sequence comparisons between idealized barrel domains from a number of organisms. The next panel 
shows analogous results for structural comparison, and the panel after that, functional comparison. The rightmost panel represents sequence 
comparisons between idealized multi-domain proteins that match over a single domain, the subject of much of this paper. 



Pawlowski et al. (2000) studied the relationship between se- 
quence and functional similarity in the twilight zone of 10%- 
15% sequence similarity and found a clear correlation be- 
tween the two, with functional similarity based on the E.C. 
classification of enzymes. 

Russell et al. (1997) analyzed binding sites in proteins 
with similar 3D structures and estimated that 90% of new 
remote homolog have common binding sites and similar 
functions. Eisenstein et al. (2000) evaluated the first results 
from the structural genomics projects and found that in many 
instances the protein structure itself offers an important clue 
to its biological function. Stawiski et al. (2000) found that 
function could be predicted rather successfully for just the 
proteases. Devos and Valencia (2000) presented a critical view 
of function transference between similar sequences, high- 
lighting the limitations of this process due to errors in data- 
bases and the inherent complexity of the relationship be- 
tween protein sequence-structure and function that does not 
allow "simplistic interpretations." They also found that bind- 
ing sites are the least conserved features between related pro- 
teins while the catalytic activity of enzymes is the most con- 
served one. 

Multi-Domain Proteins with Divergent Functions: 
How Common? 

Most of these previous investigations focused on single- 
domain proteins or did not distinguish between single- and 
multi-domain ones. It is not clear how the multi-domain pro- 
teins with various functions behave with respect to functional 
conservation; namely, whether they are more or less con- 
served than their single-domain counterparts. In particular, as 
shown in Figure 1, if one multi-domain protein shares a single 
domain fold with another one, it is not clear the degree to 
which the functional conservation of these proteins is con- 
strained by the shared part, and to what degree it is influenced 
by other domains that are not shared. 

Specific groups of proteins that have the same combina- 
tion of structural domains but dramatically different func- 
tions illustrate this situation. One example is the combination 



of the SH3-domain (scop superfamily identifier 2.24.2) and 
the P-loop containing NTP hydrolase (3.29.1). While in 
higher organisms this combination is associated with presyn- 
aptic and tumor suppressor functions (SWISS-PROT names 
SP02_HUMAN and DLGI.DROME, respectively), in the lower 
Dictyostelium it was found in myosin (MYSP_DICDI). An- 
other example is the combination of the FAD/NAD(P)- 
binding superfamily and FAD-linked reductases C-terminal 
superfamily (3.4.1 and 4.12.1 superfamilies, respectively). In 
one group of proteins they appear in enzymes of the oxido- 
reductase group (e.g. OXDA.CAEEL or PHHY_PSEAE), while 
in another they are found in a dissociation inhibitor (e.g. 
GDI A_HUM AN) . It should be noted that the proteins are not 
covered completely by the structural matches, so it is quite 
possible that the rest of them contain totally different do- 
mains that are responsible for the dramatically different func- 
tions. However, do these two examples show a rather rare or 
a more frequent phenomenon? How often do multi-domain 
proteins, sharing the same structural domain composition, 
differ in their functions? 

In this paper, we attempt to provide a comprehensive 
answer to this question. This is particularly timely given that 
most of the unknown proteins in eukaryotic genomes are 
multi-domain. We use the same approach as in our previous 
analyses, comparing the sequences of the structural domains 
in scop to those of SWISS-PROT using BLAST P. We focus on 
the functional divergence of single and multi-domain pro- 
teins, extending previous investigations of single-domain 
proteins. Also, in comparison to previous work, we focus 
more on non-enzymatic functions and scop structural super- 
families, instead of folds. 

RESULTS 

Our Approach to Functional 
and Structural Assignment 

We used the blastp program (version 2.0) (Altschul et al. 
1997) to identify the scop 1.39 (Murzin et al. 1995) structural 
domains in SWISS-PROT (version 37) (Bairoch and Apweiler 
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2000) with e = 10" 4 . We removed the hypothetical and frag- 
ment proteins. This resulted in two sets of proteins. 

Single-Domain 

Of the single-domain matches, only those that were almost 
completely covered with a match to a single structural do- 
main were selected. (The maximum number of uncovered 
residues was set at 70 with an additional condition that a 
maximum of 40 residues on the N-terminal end and 30 resi- 
dues on the C-terminus were allowed to be uncovered.) These 
criteria resulted in 1818 single-domain proteins being selected 
from SWISS-PROT. 

Multi-Domain 

We selected 4763 multi-domain proteins from SWISS-PROT. 
All of these matched (in different locations) at least two do- 
mains of known structure belonging to different scop super- 
families (see schematic in Figure 1). We also selected a subset 
of these proteins that have almost their entire length covered 
by matches with structural domains (allowing again a maxi- 
mum of 70 uncovered residues). This selection resulted in 
2829 proteins being selected from SWISS-PROT. (In all cases, 
duplicate matches were removed, i.e., a protein at a certain 
location matches only one structural domain.) 

We set out to compare these two sets of proteins for 
functional divergence. As previously, we divided functions 
into enzyme and non-enzyme (Hegyi and Gerstein 1999). En- 
zymatic functions were classified by the EC system (Bairoch 
2000). Comparisons of enzymatic functions were treated the 
same way as in our earlier analyses, that is, if they differ in the 
first three components of their respective EC numbers, they 
were considered different. This implied that our analysis dealt 
with a total of 1 12 enzymatic functions. Non-enzymatic func- 
tions were classified into 508 different categories based on a 
simple thesaurus we assembled of synonymous keywords 
drawn from SWISS-PROT description lines. In addition, we 
created 49 categories for functions that have an enzymatic 
component but which are not part of the EC system. This gave 
us a total of 669 functions (1 12 + 508 + 49). (The list of all the 
functional categories is described further in Table 2 below, 
and also can be found on the Web at http://bioinfo. 
mbb.yale.edu/partslist/func or http://partslist.org/func.) 

Overall Distribution of the Matches 

Figure 2 shows the most commonly observed multi-domain 
combinations in a set of recently sequenced genomes. The 
occurrences of further combinations are available from the 
Web site. Clearly, the distribution is very skewed, with certain 
combinations, such as 3.29-2.32, and 2.29-4.61 tending to 
predominate. 

Figure 3 shows the overall distribution of the single- 
domain and multi-domain matches in the different structural 
classes. The distribution of matches between enzymes and 
non-enzymes in multi-domain proteins largely agrees with 
that in the single-domain proteins. The multi-domain 
matches follow the overall tendency of the alpha/beta folds to 
be associated with enzymes to a larger extent and the all- 
alpha and small folds with non-enzymes. However, the values 
for the multi-domain matches are generally less extreme than 
for single-domains; for example, the 10-fold difference be- 
tween single-domain alpha/beta enzymes and non-enzymes 
decreases to about twofold in multi-domain proteins. Another 
significant difference is the reduction in the number of multi- 
domain non-enzymes in the all-beta and alpha + beta struc- 



FOLD PAIRS 



fold fold 

1 2 

3.29 2.32 

2.29 4.61 

4.1 4.34 



3.4 4.48 

3.22 4.42 
2.32 4.1 
2.32 2.33 
4.32 3.1 

3.23 4.69 
3.47 5.17 
4.72 5.13 
3.22 4.1 

3.1 
3.42 
3.3 
4.1 
4.34 
1 .79 
2.34 



3.5 
4.61 
1.76 
4.29 
2.32 
3.22 
3.52 
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Figure 2 Distribution of multi-domain combinations amongst the 
genomes. The figure shows the occurrence of multi-domain fold com- 
binations in a number of genomes, indicating its great variability. 
Each row indicates a particular combination of scop fold pairs (using 
scop 1 .39), where a fold pair is defined as two distinct folds occurring 
in tandem in a protein. Each column represents a different genome, 
using the four-letter codes in the PartsList system (Qian et al. 2001): 
Aaeo, Aquifex aeolicus; Aful, Archaeoglobus fulgidus; Bbur, Borrelia 
burgdorferi; Bsub, Bacillus subtilis; Cele, Caenorhabditis elegans; Cpne, 
Chlamydia pneumoniae; Ctra, Chlamydia trachomatis; Ecol, Echerischia 
coli; Hinf, Haemophilus influenzae Rd; Hpyl, Helicobacter pylori; Mthe, 
Methanobacterium thermoautotrophicum; Mjan, Methanococcus jan- 
naschii; Mtub, Mycobacterium tuberculosis; Mgen, Mycoplasma geni- 
talium; Mpne, Mycoplasma pneumoniae; Phor, Pyrococcus horikoshii; 
Rpro, Rickettsia prowazekii; Seer, Saccharomyces cerevisiae; Syne, Syn- 
echocystis sp.; Tpal, Treponema pallidum. The numbers in each inter- 
section cell indicate the number of times the fold pairs occur in a 
genome. Only the 20 most common fold pair combinations are 
shown here; the remainder are shown on the Web site (http:// 
partslist.org/func). If a cell is greater than 6, it is shaded black; be- 
tween 3 and 6, gray; and below 3, white. The blank spaces show 
instances in which one of the pairs does not occur in the organism at 
all (indicated by a value of -1 in the data table on the Web site). The 
fold assignments are done in a fashion consistent with those in 
PartsList and associated systems (Gerstein 1 997; Lin et al. 2000; Dra- 
wid et al. 2001; Harrison et al. 2001; Qian et al. 2001). 



rural classes compared to the single-domain matches. Alto- 
gether, there are more enzymes than non-enzymes among the 
multi -domain proteins (2805 enzymes vs. 1958 non-enzymes) 
whereas for single-domain proteins, the opposite is true (850 
enzymes vs. 968 non-enzymes). 

Table 1 summarizes the distribution of superfamilies and 
superfamily combinations among the major functional 
classes, i.e. whether they have only enzymatic, only non- 
enzymatic or both enzymatic and non-enzymatic functional- 
ity. Altogether, 215 superfamilies were found in single-domain 
proteins and 310 in multi-domain ones. As 70 superfamilies 
were found in both, altogether 455 distinct structural super- 
families matched a SWISS-PROT protein with our required 
coverage criteria (described above). Similarly, we apportioned 
the 281 superfamily combinations observed in multi-domain 
proteins amongst different broad functional categories. 

In single-domain proteins there are about as many su- 
perfamilies with exclusively enzymatic functionality as there 
are those with exclusively non-enzymatic functions (82 vs. 
78). In contrast, in multi-domain proteins this ratio increases 
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Figure 3 Distribution of proteins amongst broad structural and 
functional classes; the distribution of the matches among the seven 
structural and two functional classes in single- and multi-domain pro- 
teins. The single-domain and multi-domain matches each total 
1 00%, independently of each other. The horizontal axis indicates the 
seven scop classes, which are (from 1 to 7): all-alpha, all-beta, alpha/ 
beta, alpha + beta, multi-domain, membrane, and small protein. 



to almost threefold (135 vs. 56). This agrees with the notion 
that most enzymes are multi-domain. Another difference be- 
tween single and multi-domain proteins appears in the ratio 
of superfamilies with a single function compared to multi- 
functional ones. As it is apparent from Table 1, about a quar- 
ter of the superfamilies matched single-domain proteins with 
different functions (55 of 215), whereas in the multi-domain 
proteins, this ratio increased to more than a third (119 of 310). 

Single-Domain Proteins 

Table 2 lists the two functionally most diverse structural su- 
perfamilies in single-domain proteins with some representa- 
tive functions. The most diverse superfamily, the 3.38.1 
Thioredoxin-like, has 11 different functions associated with 
it, most of them with an oxidoreductase mechanism. For in- 
stance, THIO_BPT4 is a small disulphide-containing thiore- 
doxin that serves as a general disulphide oxidoreductase, 



while TDX2_BRUMA is almost twice as long (199 aa) and 
serves as a thiol-specific antioxidant that acts against sulfur- 
containing radicals. Another interesting example of func- 
tional diversity is provided by the Scorpion toxin-like super- 
family (7.3.6). While BRAZ_PENBA is a small protein that is 
known to be 2000 times sweeter than sucrose, the other mem- 
bers of the superfamily are associated with different host- 
defense mechanisms. In insects the superfamily possesses 
antifungal activity (DMYC_DROME) or acts as a toxin 
(SCX5_BUTEU). Interestingly, in plants it can also act as an 
antifungal (AF2B_SINAL) or as an inhibitor of insect alpha- 
amylases (SIAl_SORBI). It appears that many single-domain 
proteins are toxins or allergens, or are related in other ways to 
a host-defense response. 

Based on the data we can also determine the probability 
of two single-domain proteins that match domains in the 
same superfamily category also carrying out the same func- 
tion. Using Bayes' theorem: 

P(F|S) = P(F)P(S|F)/((P(F)P(S|F) + P(-F)P(S|-F)) (1) 

where 5 is the probability that two proteins share the same 
superfamily, F is the probability that two proteins have the 
same function, and ~F is the probability that two proteins do 
not have the same function. Rearranging and simplifying the 
equation we get: 

P(F|S) = 1/(1 + N(S,~F)/(N(S,F)) (2) 

where N is the number of times that the two events in the 
parentheses occur together in our database of 1818 single- 
domain proteins. This results in 

P(F|S) = 1/(1 + 8501/12516) = 68%. 

That is, the probability that two single-domain proteins that 
have the same superfamily structure have the same function 
(whether enzymatic or not) is about 2/3. 

Multi-Domain Proteins 

Table 3 lists the combinations of superfamilies that have been 
associated with the greatest number of different functions in 
multi-domain proteins, with representative entries in SWISS- 
PROT. The combination with the greatest number of different 
functions is that of 1.95.1 and 7.33.1. Although it has twice as 
many different functions as the most diverse superfamily in 



! Table 1 . Functional Distribution of Single-domain, Multi-domain Superfamilies, and 
I Multi-domain Combinations 



Single-domain Multi-domain Multi-domain sfam 

superfamilies superfamilies combinations 





Single 


Multiple 


Single 


Multiple 


Single 


Multiple 




function 


function 


function 


function 


function 


function 


Enzymatic 


82 


11 


135 


42 


151 


16 


Nonenzymatic 


78 


23 


56 


30 


70 


27 


Both functions 




15 




47 




17 


Total 


160 


55 


191 


119 


221 


60 



■ The basic functional distribution of the superfamilies in single- and multi-domain proteins and the 

1 functional distribution of multi-domain combinations are shown. The first row lists the number of 

j scop superfamilies that were associated only with enzymatic function in each category. The second 

\ row lists the number associated with only nonenzymatic functions, and the third row indicates the 

; number of superfamilies that were associated with both types of function. Altogether, we charac- 

, terized 160 + 55 = 215 single-domain and 191 +119 = 310 multi-domain superfamilies, 70 of 
which overlapped in the two categories. 
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i Table 2. Most Versatile Single-Domain Superfamilies 
1 No. No. Sfam 



func 


prot 


comb 


Function 


SWISS-PROT ID 


SWISS-PROT function 


11 


69 


3.38.1 


E1.11.1 

263# 

D260# 

268# 

266# 

269# 

272# 

D271#272# 
261 # 


GSHP RAT 
DYL5 CHLRE 
BSAA BACSU 
REHY TORRU 
PHOS HUMAN 
REHY ORYSA 
THIO BPT4 
TDX2 BRUMA 
BTUE.ECOU 


Plasma Glutathione Peroxidase (1 .1 1 .1 .9) 

Dynein, Flagellar Outer Arm-C reinhardtii 

Glutathione Peroxidase Homolog Bsaa 

Rehydrin-Torru/o ruralis (Moss) 

Phosducin (33 Kd Phototransducing Protein) 

Rad24 Protei n-Oryza sativa (Rice) 

Thioredoxin (Bacteriophage T4) 

Thioredoxin Peroxidase 2 

Vitamin B12 Transport Periplasmic Protein Btue 


10 


28 


7.3.6 


342# 

376#336# 
341#356# 
343# 
361# 

361#376# 

336# 

203# 


BRAZ PENBA 
SCKK TITSE 
AF2B SINAL 
DEFA ZOPAT 
DMYC DROME 
SCX5 BUTEU 
SCX 3 LEIQH 
SIA1_SORBI 


Brazze'm-Pentadiplandra brazzeana 
Neurotoxin Ts-Kapa (TskHBrazilian scorpion) 
Cysteine- Rich Antifungal Protein 2b (Afp2b) 
Defensin, Isoforms B And C-Zophobas atratus 
Drosomycin Precursor (Cysteine-Rich Peptide) 
I nsecto toxin !5a-(Lesser Asian scorpion) 
Leiuropeptide liKScorpion) 
Small-Pr Inhibitor Of Insect Alp ha- Amylases 


7 


34 


4.79.3 


31 0# 

311# 

231 # 

312# 

E3.1.- 

314# 


AB18 PEA 
DRR3 PEA 
MPAA CORAV 
L18B LUPLU 
RNS2 PANG! 
SAM2_SOYBN 


Aba-Responsive Protein Abr18-Garden Pea 
Disease Resistance Response Protein Pi49 
Major Pollen Allergen Cor A 1,-Eu. Hazel 
Protein LI rl 8b (Uprl 0.1b) 
Ribonuclease 2 (3.1 .-/-)-Panax Ginseng 
Stress- Induced Protein Sam22 


7 


43 


1.26.1 


184# 

381#564#184# 

185# 

187# 

186# 

188# 


CSF2 SHEEP 
IL4 RAT 
LIF HUMAN 
PRL ANGAN 
PLF3 MOUSE 
SOMA PAROL 


Colony-Stimulating Factor 
lnterleukin-4 (B-Cell Igg Diff. Factor) 
Leukemia Inhibitory Factor (Lif) 
Prolactin Precursor (Prl)- 
Proliferin 3 Mitogen-Regulated 
Somatotropin (Growth Hormone) 



The most versatile superfamilies in single-domain proteins as determined from their functional description in SWISS- j 
PROT, with some representatives. The keyword combinations in the fourth column were based either on the first three j 
components of their EC numbers (for enzymes) or derived automatically by comparing the DE description line of | 
SWISS-PROT entries to a list of synonymous keywords at http://bioinfo.mbb.yale.edu/partslist/func. A keyword num- I 
ber starting with a D indicates an enzyme that does not have an assigned EC number in its description in SWISS-PROT. j 



the single-domain proteins (22 vs. 11, respectively), careful 
examination reveals that all the proteins in this category are 
DNA-binding and most of them act as hormone receptors. 

The second entry listed in the table is the combination of 
the 3.4.1 and 4.48.1 superfamilies associated with the FAD/ 
NAD(P)-linked reductases. It is an all-enzymatic combination 
and always carries out an oxido-reductase function. All the 
proteins in this category are completely covered by matches 
with these two superfamilies. The 1.78.1-2.1.1 hemocyanin- 
immunoglobulin combination seems also to be fairly con- 
served; although the proteins in this category are called by 
eight different names, most of them turn out to be extracel- 
lular larval storage proteins, except for the copper-containing 
oxygen carrier hemocyanin itself (HCY_PALVU). 

Following the same logic, we can also determine the 
probability that two proteins that have the same superfamily 
combination share the same function, viz: 

P(F|S) = 1/(1 + 32242/134230) = 81% 

This means that we have significantly greater certainty in de- 
termining the function of a multi-domain protein with a par- 
ticular superfamily combination than that of a single-domain 
protein containing a particular superfamily. We also deter- 
mined a similar probability for those proteins that have an 



almost complete coverage with exactly the same type and 
number of superfamilies, following each other in the same 
order. The probability that the functions are the same in this 
case was 91%, a considerably higher value than above. How- 
ever, if two multi-domain proteins share only a single super- 
family, the probability that they share the same function 
drops to only 35%! This greater functional certainty from 
sharing a combination of superfamilies rather than just one is 
also reflected in Table 1. While one-fourth of the single- 
domain proteins and one-third of singularly matching super- 
families in multi-domain proteins have multiple functions, 
only about one-fifth of the multi-domain combinations pos- 
sess multiple functions (60 of 281). It is also clear from the 
data that domains in larger proteins often lose their original 
function and no longer have an autonomous function. 

Seventy Common Superfamilies and Their 
Functions Compared in Single-Domain 
and Multi-Domain Proteins 

As mentioned above, of the 455 superfamilies in our analysis, 
only 70 occur in both single- and multi-domain proteins. 
Even more surprising is the small number of structural super- 
families (14) that have the same function in both single- and 
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> Table 3. Most Versatile Superfamily Combinations in Multi-Domain Proteins 



No. No. 
func prot 



Sfam 
comb. 



Function 



SWISS-PROT ID 



SWISS-PROT function 



29# THB.RANCA Thyroid Hormone Receptor Beta 

10# HNF4_DROME Transcription Factor HNF-4 Homolog 

31#32# EAR2.MOUSE V-Erba Related Protein Ear-2 

29#30# ECRJvlANSE Ecdysone Receptor (Ecdysteroid Receptor) 

32# ERBA.AVIER Erba Oncogene Protein 

556#564#35# NCFI_XENLA Nerve Growth Factor induced Protein l-B 

576# NR42_HUMAN Immediate-Early Response Protein Not 

36# PPAT_HUMAN Peroxisome Pro! if era tor Activated Receptor 

37# RXTC_CHICK Retinoic Acid Receptor RXR-Gamma 

38# TLL_DROVI Tailless Protein 



22 



176 1.95.1/7.33.1 



E1.8.2 DHSU.CHRVI Sulfide Dehydrogenase (1 .8.2.-) 

El .8.1 DLDH_ZYMMO Dihydrolipoamide Dehydrogenase (1 .8.1.4) 

8 54 3.4.1/4.48.1 E1 .6.4 TYTR.TRYCR Trypanothiorie Reductase (1.6.4.8) (Tr) 

El .16.1 MERA.STRLI Mercuric Reductase (1.16.1.1) 

E1.6.99 NAOX.MYCPN Probable NADH Oxidase (1.6.99.3) (Noxase) 

19# ARYB_MANSE Arylphorin Beta Subunit-(Tobacco Hornworm) 

20# CRPI_PERAM Allergen Cr-Pi Precursor-(American Cockroach) 

21#427# HCY_PALVU Hemocyanin-(European Spiny Lobster) 

8 23 1.78.1/2.1.1 22# HEXA.BLADI Hexamerin PrecursoMJropical Cockroach) 

23# JSP1JTRINI Acidic juvenile Hormonne-Suppressible Protein 

24# LSP2_DROME Larval Serum Protein 2 Precursor (LSP-2) 

546#25# SSP1_BOMMO Sex-Specific Storage-Protein 1 

Note that the combination with the greatest number of different functions is that of 1.95.1 and 7.33.1. Careful 
examination reveals that all the proteins with this combination are DNA-binding and most of them act as various 
hormone receptors. In particular, HNF4_DROME and NR42_HUMAN also have transcription activator functions. Note 
that these two proteins are considerably longer than the others in this group arid are not covered completely by 
structural matches: A large C- terminal and a large N-terminal portion are left uncovered, respectively. 



multi-domain proteins. These are listed in Table 4; 12 of them 
have enzymatic function, supporting the notion that en- 
zymes are more conserved during evolution than non- 
enzymes. The two non-enzymatic superfamilies are the 4.29.1 
ribosomal superfamily and the 5.4.1 superfamily in penicillin- 
binding proteins. 

Table 5 presents several examples of the converse situa- 
tion, shared superfamilies that have different functions in 
single and multi-domain proteins. Comparing parts A and B 
of the table highlights the fact that although both superfami- 



lies in a multi-domain protein are often present in single- 
domain form as well, the functions in the different settings 
are only vaguely related. One example is the combination of 
the lipocalin superfamily (2.45.1) with that of the BPTI-like or 
Kunitz inhibitor (7.7.1), which in higher organisms forms a 
complex protein called alpha-1 -microglobulin (AMBP_RAT). 
Another interesting example is the combination of the 2.5.1 
Cupredoxin (occurring in the single-domain blue-copper pro- 
tein, SOXE_SULAC) and the 6.5.1 Membrane all-alpha 
(single-domain representative: BACT_HALVA, a sensory rho- 



Table 4. Superfamilies With the Same Function in Single- and Multi-Domain Proteins as Determined from Their Keyword 
Combination or First Three Components of Their EC Numbers 



Sfam 



Single-domains proteins 



Multi-domain proteins 



SWISS-PROT 



Function 


ID 


E3.2.1 


CUNY ERWCH 


E3.5.1 


URE2 YERPS 


E6.3.5 


NADE MYCPN 


E3.1.3 


PTP2 NPVOP 


E4.2.1 


TRPB VIBPA 


E5.2.1 


FKB1 MET] A 


E3.2.1 


LYCV BPP2 


85# 


RS5 ACYKS 


E3.4.24 


SNPA STRCS 


E3.5.1 


URE3 YERPS 


E2.7.7 


KANU STAAU 


161# 


AMPH ECOLI 



SWISS-PROT function 



SWISS-PROT 
ID 



SWISS-PROT function 



1 .81.1 

2.66.2 

3.17.2 

3.37.1 

3.67.1 

4.19.1 

4.2.1 

4.29.1 

4.52.1 

4.6.1 

5.10.1 

5.4.1 



Endoglucanase (3.2.1.4) AMYG_NEUCR 

Urease Beta (3.5.1.5) URE1.HELPY 

NAD(+) Synthetase (6.3.5.1) CUAA_YEAST 

Protein-Tyrosine Phosphatase 2 (3.1 .3.48) PTNB.RAT 

Tryptophan Synthase (4.2.1 .20) TRP_YEAST 

Peptidylprolyl Gs-Trans Isomerase (5.2.1 .8) FKB7_WHEAT 

Lysozyme (3.2.1 .1 7) CHIX_PEA 

30s Ribosomal Protein S5 RS5_TREPA 

Extracellular Neutral Protease (3.4.24.-) BMPH_STRPU 

Urease Gamma (3.5.1 .5) URE1.HELPY 

Kanamycin Nucleotidyltransferase (2.7.7.-) DPOB_XENLA 

Penicillin-binding Protein Amph PBPX_STRPN 



Glucoamylase Precursor (3.2.1 .3) 
Urease Alpha Subunit (3.5.1 .5) 
CMP Synthase (6.3.5.2) 
Protein-Tyrosine Phosphatase (3.1.3.48) 
Tryptophan Synthase (4.2.1 .20) 
70 Kd Peptidylprolyl Isomerase (5.2.1.8) 
Endochitihase Precursor (3.2.1 .14) 
30s Ribosomal Protein S5 
Collagenase 3 Precursor (3.4.24.-) 
Urease Alpha Subunit (3.5.1.5) 
Dna Polymerase Beta (2 J. 7. 7) 
Penicillin-binding Protein 3x Pbp2x 
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Table 5. Examples of Superfamilies Present in Both Single- and Multi-Domain Proteins, 
Carrying out Different Functions 



Table 5A. Single-Domain Proteins 



Sfam 


Funct # 


SWISS-PROT ID 


SWISS-PROT function 


1.25.1 


352# 
183# 
El. 17.4 
192# 


FTN2 HAE1N 
NIGY DESVH 
RIR4_YEAST 

Klin UADKI 

NLP_HAblN 


Ferritin-like Protein 2 
Nigerythrin 

(Ribonucleotide Reductase) (1.17.4.1) 
Ner-like Protein Homolog 


1 .4.3 


196# 


t_it a ni Ahi i 
HI A_PLAUU 


Histone H ia, operm 


1.81.2 


E2.5.1 


PFTB.PEA 


Farnesyltransferase Beta Su (2.5.1.-) 


2.45.1 


226# 
22 7# 

228#412# 
229# 
E5.3.99 
230#421# 


ERBP„RAT 
FAB3 CAEEL 
NGAL MOUSE 
NP4_RHOPR 

ry/~ 11 p\ ill IK/IAKI 

PuHD HUMAN 
VNS1_MOUSE 


Epididymal-Tetinoic Acid Binding Protein 
Fatty Acid-Binding Protein Homolog 3 
Neutrophil Gelatinase- Assoc. Lipocalin 
Nitrophorin 4 Precursor 
Hrostagiandin-H2 u-isom erase (D.J.yy.z) 
Vesomeral Secretory Protein 1 


2.5.1 


231# 
232#427# 


MPA3 AMBEL 
SOXE_SULAC 


Pollen Allergen AMB A 3 (AMB A lii) 
Sulfocyanin (Blue Copper Protein) 


3.14.2 


373# 


RRF1_DESVH 


Rrfl Protein 


3.29.1 


E6.3.4 

CO 7 A 

D259# 
E2.7.1 


PURA.CAEEL 

Mnl 1 CM o 1 

VA57_VACCV 
KITH VZVW 


Adenylosuccinate Synthetase (6. 3. 4 A) 

1 My ill iuy Idle MHdoc yZ./ .*t.;*^ 

Guanylate Kinase Homolog 
Thymidine Kinase (2.7.1 .21) 


3.47.1 


275# 
276# 


MBL BACSU 
MREB_BACSU 


MBL Protein 

Rod Shape-determining Protein Mreb 


3.48.1 


E3.1 .3 


PPA5 YEAST 


Rpnrp?^ihlp AriH Phn^nhata^p 1 ^ "?\ 

w^yji sz^^i kw/i^r r i ivsoi»ft id lu^v ^./.i .mJ*iL/ 


3.81 .1 


D281# 
282# 


AMIC_PSEAE 
LUXP VlBHA 


Aliphatic Amidase Expression- Regulator 

I I IX P Prntpin Prprnrsnr 


4.103.1 


E2/4/2 


TOX1 BORPE 


Pertussis Toxin Su 1 (2.4.2.—) 


4.105.1 


291# 


LECC_POLMI 


Lectin-Polyandrocarpa Misakiensis 


4.11.5 


295# 


TERP_PSESP 


Terpredoxin 


4.19.1 


E5.2.1 


FKB1_MET|A 


Pept-Prolyl Cis-Trans Isomerase (5.2.1.8) 


6.5.1 


E3.6.1 
540#325# 


ATPL VIBAL 
BACTHALVA 


ATP Synthase (3.6.1.34) (Lipid-binding) 
Sensory Rhodopsin II (Sr-M) 


7.35.4 


El .9.3 
345# 


COXB RAT 
DESR_DESBI 


Cytochrome C Oxidase (1.9.3.1) (Via*) 
Desulforedoxin (Dx) 


7.7.1 


349# 


TAPJDRNMO 


Tick Anticoagulant Peptide 


(Table continues on following page.) 



dopsin) superfamilies into a component of the respiratory 
chain, cytochrome C oxidase II (COOX.ZOOAN). All these 
examples demonstrate the evolutionary advantage of a do- 
main fusion event, which creates a function that is more com- 
plex than either of the components. 

Multifunctionality vs. Sequence Similarity 

Previously, we presented a variety of graphs that show how 
the probability that two domains would share the same func- 
tion varied with respect to sequence similarity (Hegyi and 



Gerstein 1999; Wilson et al. 2000). Figure 4 shows a similar 
graph with the calculations extended to multi-domain pro- 
teins. The figure shows that the functional divergence of a 
single domain in multi-domain proteins dramatically in- 
creases, more than twofold, compared to the single-domain 
ones. This reinforces our findings above, based only on super- 
family content, that the certainty with which we can predict 
the function of a protein based on its sequence similarity with 
a domain in another multi-domain protein, is considerably 
less than for a comparable single-domain situation. 
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Table SB. Multi-Domain Proteins 



Sfam Comb. 


Funct# 


SWISS-PROT ID 


SWISS-PROT function 


1.25.1/7.35.4 


104# 


RUBY_METJA 


Putative Rubrerythrin 


1.32.1/3.81.1 


11# 
12# 

581 #1 1 # 
582#11# 


PURR HAEIN 
DECAJBACSU 
SCRR STRMU 
REGA„CLOAB 


Purine Nucleotide Synthesis Repressor 
Degradation Activator 
Sucrose Operon Repressor 
Transcription Regulatory Protein Rega 


1.4.3/3.14.2 


10# 

n# 

13# 

190# 

366# 


SKN7 YEAST 
VIRG AGRT5 
RGX3 MYCTU 
PFER PSEAE 
PETRJWOCA 


Transcription Factor Skn7 (Pos9 Protein) 
Virg Regulatory Protein 
Sensory Transduction Protein REGX3 
Transcriptional Activator Protein Pfer 
Petr Protein 


2.45.1/7.7.1 


203#T53# 


HC_RAT 


Alpha-1 -Microglobulin/Trypsin Inhibitor 


2.5.1/6.5.1 


El .9.3 


COX2_ZOOAN 


Cytochrome C Oxidase li (1 .9. 31) 


3.29.1/3.48.1 


E2.7.1 


F26_RANCA 


6-Phosphofructo-2-kinase (2.7.1 .105) 


3.47.1/5.17.1 


1# 

1#83# 


YEDO YEAST 
CR73_MAIZE 


Heat Shock Protein 70 Homolog YEL030w 
Ig-Binding Protein 



DISCUSSION 

Here we built on our previous studies on the relationship 
between protein structure and function to develop new re- 
sults related to multi-domain proteins. Throughout the paper, 
we focused on superfamilies instead of folds, as the members 
of a superfamily are presumably of common evolutionary ori- 
gin (Murzin et al. 1995). 

We found that the 4763 multi-domain and 1818 single- 
domain proteins that met our selection criteria have about 
the same distribution of structural classes, with more enzy- 
matic functions associated with the alpha/beta structural 
classes and more non-enzymatic ones with the all-alpha and 
small classes. We identified more than three times as many 
multi-domain proteins that were enzymes than single- 
domain ones (2805 and 850, respectively) and, conversely, 
about twice as many multi-domain proteins as single-domain 
ones that were non-enzymes (1958 vs. 968). 

We focused on the functional divergence of the two 
groups and found that about a quarter of the superfamilies in 
single-domain proteins are associated with multiple func- 
tions, whereas only about a fifth of the multi-domain super- 
family combinations are. Therefore, we can conclude that a 
combination of specific superfamilies results in a more spe- 
cific functional assignment for a particular protein. However, 
about one-third of the superfamilies in the multi-domain pro- 
teins were associated with multiple functions, underlining 
the lesser autonomy of a domain function in multi-domain 
protein. 

This latter finding was also supported by the difference 
in functional divergences between the two groups of proteins 
based on particular sequence similarities between the do- 
mains and SWISS-PROT proteins. As is shown in Figure 4, the 
average functional divergence of a single domain is much 
larger (more than twofold) in multi-domain proteins than in 
single-domain ones. 

We also found that only 70 of a total of 455 superfamilies 
are shared between the multi-domain and single-domain pro- 
teins and only a small fraction (14) share their functions. This 



was rather surprising to us, and should be taken into consid- 
eration in functional characterization and annotation of new 
gene products. When the functions were related in single- and 
multi-domain proteins, we could observe an increasing func- 
tional complexity with the appearance of large multi-domain 
proteins. 

Altogether, with the recent sequencing of the human 
genome and the genomes of other model organisms, we hope 
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Figure 4 Divergence in function with respect to sequence similar- 
ity. Relative number of matching domains with multiple functions, as 
the function of e-value threshold. Diamonds represent single-domain 
proteins, squares multi-domain ones (matching just for a single do- 
main), respectively. The first value on the X-axis starts at 4 (corre- 
sponding to an e-value=10~ 4 ). 
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that this work can contribute to the successful annotation of 
the individual gene products, and will help to avoid some 
pitfalls associated with the functional characterization of 
large, complex proteins. 

The publication costs of this article were defrayed in part 
by payment of page charges. This article must therefore be 
hereby marked "advertisement" in accordance with 18 USC 
section 1734 solely to indicate this fact. 
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Figure 2-42 

Structure of a £-turn. The CO group 
of residue 1 of the tetrapeptide shown 
here is hydrogen bonded to the NH 
group of residue 4, which results in 
a hairpin turn. 



POLYPEPTIDE CHAINS CAN REVERSE DIRECTION 
BY MAKING 0-TURNS 

Most proteins have compact, globular shapes due to frequent rever- 
sals oi the direction of their polypeptide chains. Analyses of The 
hree.dzmens.onal structures of numerous proteins have revealed 
that many of these chain reversals are accomplished by a common 

~ he co nt cal,ed ; he tr- The esscn ' e ° f this hai ^n 

is that the CO group of residue n of a polypeptide is hvdroeen 
bonded to the NH group of residue (. + 3) (FigSre 2-42) ThTa 
polypeptide chain can abruptly reverse its direction. " 

LEVELS OF STRUCTURE IN PROTEIN ARCHITECTURE 

In discussing the architecture of proteins, it is convenient to refer to 
four levels of structure. Primary structure is simply the sequence of 
amino acids and location of disulfide bridges, if tLre aTIy Th 
primary structure is thus a complete description of the covalent 
connects of a protein. Secondary structure refers to the sterk rela 
unship of amino acid residues that are dose to one ano ner in t he 
hnea r e„ c . Some of these steric relationships are of ^ 

stet' and n fh e 1Se U & ^ ^ " ^ the ' P Ie * ed 

sheet, and the collagen helix are examples of secondary structure 

Ternary structure refers to the steric relationship of amino acid rest 

thaTth " Hnear Se ^ ence It should be no d 

that the dividing line between secondary and tertiary structured 
arbi rary. Proteins that contain more than one polypeptide cTain 
display an additional level of structural organizational Z 
ternary s^ which refers to the way in which the chZ Ze 

a^Tnth - EaCh f P ,° lyPePtide Chain in SUCh a P-eit ^alS 
a subumt. Another useful term is domain, which refers to a compact 

globular unit of protein structure. Many proteins fold into domains 

having masses that range from 10 to 20 kdal. The domains ofTZe 



H-C— 0-0 H 
Performic acid 



AMINO ACID SEQUENCE SPECIFIES 
THREE-DIMENSIONAL STRUCTURE 

PrSb and T ^ ^ amin0 add se <l-nce of a 

protein and its conformation came from the work of Christian 
Anfinsen on ribonuclease, an enzyme that hydrolyzes RNA R Z 

^l^&r^.*** C ° nSiSt ^ ° f ^4 amtno^dd 
residues (Figure 2-43). It contains four disulfide bonds, which can 

be irreversibly oxidized by perforrmc aad to give cysteic add retlues 
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™ — IN ^THE TOUTED STATES PATENT AUD TRADEMARK OFFICE 

DECLARATION OF JOHN C. ROCKETT, Ph.D. 
UNDER 37 C.F.R. § 1.132 

I, JOHN COUGHLIN ROCKETT III, Ph.D., declare and 
state as follows: 

1. Since 1995 I have been engaged full-time in 
molecular toxicology research, with an emphasis on the 
application of expression profiling techniques, including but 
not limited to nucleic acid microarray expression profiling 
techniques, to studies of the mechanisms of toxicant action 
and to the design of assays to monitor toxicant exposure. 

2. My curriculum vitae, including my list of 
publications, is attached hereto as Exhibit A. 

3. For the past 5 years, my work has focused 
primarily on analyzing the effects of potentially hazardous 
environmental agents, such as heat, water disinfectant 
byproducts, and conazole fungicides on the male reproductive 
tract. Although we are interested in the basic mechanisms of 
action of such toxicants, we also have two practical goals in 
mind: first, to identify individual agents and families of 
agents that adversely affect male reproductive development and 
function, and second, to develop methods for monitoring human 
exposure to such agents, particularly methods capable of 
identifying toxicant exposure at an early stage. 

4. I have relied on expression profiling as a 
principal approach to these goals. Expression profiling, by 



-reporting -the expression lev Is of thousands of genes 
simultaneously, gives us an opportunity to identify and group 
toxicants based on similarities in the patterns of gene 
expression they induce in cells and tissues; the gene 
expression profiles induced by treatment with known testicular 
toxins serve as standards, molecular signatures or molecular 
fingerprints as it were, against which the patterns of gene 
expression induced by agents of unknown toxicity may be 
compared and judged. In addition, gene expression profiling 
may give us the opportunity to detect toxicity before more 
gross phenotypic changes become manifest. 

5. In keeping with this research emphasis, I have 
until recently: 

served on the Microarray Technical 
Subcommittee of the United States Environmental 
Protection Agency (EPA) Genomics Task Force, and 

served on the Scientific Committee for 
the conference series on "Critical Assessment of 
Techniques for Microarray Data Analysis," held 
annually at Duke University, Durham, NC; 

and I currently 



serve on the Technical Committee on the 
Application of Genomics to Mechanism- Based Risk 
Assessment of the International Life Sciences 
Institute's Health and Environmental Sciences 
Institute, 

serve on the Genomics and Proteomics 
Committee of the National Health and Environmental 
Effects Research Laboratory of the EPA's Office of 
Research and Development, 

belong to the [North Carolina Research] 
Triangle Array Us rs Group, 



belong to the Molecular Biology 

Speciality Section of the Society of Toxicology, 
and 

belong to the Triangle Consort ium for 
Reproductive Biology. 

In addition, I am the principal investigator on a cooperative 
research and development agreement (CRADA) entitled 
-Development of a Genetic Test for Male Factor Infertility.- 
Prior to this, I was a co-principal investigator on a 
materials cooperative research and development agreement 
(MCRADA) to print oligonucleotide-based microarrays; and from 
1999 - 2002, I was coinvestigator on a CRADA to develop gene 
microarrays for toxicology applications. 

6. I presume the reader's familiarity with the 
basic construction and operation of microarrays. For purposes 
of the discussion to follow, I use the phrase "nucleic acid 
microarray- and, equivalently, the term "microarray" to refer 
generically to the various types of nucleic acid microarray 
that include immobilized nucleic acid probes of sufficient 
length to permit specific binding, with minimal cross- 
hybridization, to the probe's cognate transcript, whether the 
transcript is in the form of RNA or DNA. Although this 
definition excludes microarrays having shorter probes, such as 
the 20-mer probes of arrays manufactured by Affymetrix, Inc., 
many of the comments that follow nonetheless apply to such 
microarrays as well. 

7. Although my own work with microarrays dates 
back only to 1998, and high density spotted nucleic acid 
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microarrays themselves date back perhaps only to- 1995 , x 
microarrays are by no means the only, nor the first, 
expression profiling tool. As I describe in detail in my 
Xenobiotica review, 3 there are a number of other differential 
expression analysis technologies that precede the development 
of microarrays, some by decades, and that have been applied to 
drug metabolism and toxicology research, including: 
(1) differential screening; (2) subtractive hybridization, 
including variants such as chemical cross- linking subtraction, 
suppression-PCR subtractive hybridization and representational 
difference analysis; (3) differential display; (4) restriction 
endonuclease facilitated analyses, including serial analysis 
of gene expression (SAGE) and gene expression fingerprinting; 
and (5) EST analysis. 

8. In my own earlier research, I used both 
reverse- transcriptase polymerase chain reaction (RT-PCR) and 
suppression-PCR subtractive hybridization (SSH) to study 
patterns of differential gene expression caused by hepatic 
challenge with nongenotoxic and genotoxic hepatotoxins .* 



Schena et al., -Quantitative monitoring of gene expression patterns 
with a complementary DNA microarray, ■ Science 270:467-470 (1995) attached 
hereto as Exhibit B. nea 

* Rockett et al., "Differential gene expression in drug metabolism and 

""ff 1 practicalities, problems and potential. - Xenobiotica 29:655-691 
(1999) (hereinafter. 'Xenobiotica review"), attached hereto as Exhibit C. 

See. e.g.. Rockett et al.. -Molecular profiling of non-genotoxic 
carcinogenesis using differential display reverse transcription polymerase 

STl? ^o C ^°?, if?f T " PCR) '" J - D ™9 Metabolism & Pharmacokinetics 

22<4):329-33 (1997). and Rockett et al . . -Use of a suppression-PCR 
subtractive hybridization method to identify gene species which demonstrate 
altered expression in male rat and guinea pig livers following 3-day 
exposure to [4-chloro-6-(2.3-xylidino) -2-pyrimidinylthio] acetic acid • 
Toxicology 144(1-3): 13 -29 (2000). attached hereto respectively as Exhibits 
D and £. 



9. These older transcript expression-profiling 
techniqu s provide analogous expression data, but with far 
lower throughput. 

10. It has been well-established, at least since 
the introduction of high density spotted microarrays in 1995, 
that : 

(i) each probe on the microarray, with 
careful design and sufficient length, and with 
sufficiently stringent hybridization and wash 
conditions, binds specifically and with minimal 
cross-hybridization, to the probe's cognate 
transcript ; 

(ii) each additional probe makes an 
additional transcript newly detectable by the 
microarray, increasing the detection range, and 
thus versatility, of this analytical device for 
gene expression profiling; 4 

(iii) it is not necessary that the 
biological function be known in order for the gene, 



The compelling logic of this proposition has likely motivated the 
^ rJ ^ b J y " P i d pr ° 9ress from the earliest high density spotted arrays in 
1995 (Schena et al., "Quantitative monitoring of gene expression patterns 
with a complementary DNA microarray,' Science 270:467-470 (1995) attached 
hereto as Exhibit B) , to the first whole genome arrays in 1997 (Lashkari et 
al., "Yeast microarrays for genome wide parallel genetic and gene 
expression analysis.- Proc. Natl. Acad. Sci. USA 94 (24 ): 13057-62 (1997) and 
DeRisi et al., "Exploring the metabolic and genetic control of gene 
expression on a genomic scale," Science 278 (5338) :680-6 (1997) attached 
hereto as Exhibits F and G, respectively) , to the concurrent announcement 
by two companies earlier this month of their respective commercial 
introductions of single chip human whole genome arrays (Pollack "Human 
Genome Placed on Chip; Biotech Rivals Put it Up for Sale," The New York 
Times. Thursday. October 2. 2003 (Business Day), attached hereto as 
Exhibit H; "Agilent Technologies ships whole human genome on single 
microarray to gene expression customers for evaluation. " Press Release 
Agilent Technologies. October 2, 2003, attached hereto as Exhibit I- 
"Affymetrix Announces Commercial Launch of Single Array for Human Genome 
Expression Analysis; More Than 1 Million Probes Analyze Expression Levels 
of Nearly 50.000 RNA Transcripts and Variants on a Single Array the Size of 
a Thumbnail." Press Release. Affymetrix, October 2, 2003, attached hereto 
as Exhibit J) . 
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or a fragment of the gene, to prove useful ^as a 
prob on a microarray to be used for expression 
analysis; 

(iv) failure of a probe to detect changes 
in expression of its cognate gene does not diminish 
the usefulness of the probe on the microarray; and 

(iv) failure of a probe to detect a 
particular transcript in any single experiment does 
not deprive the probe of usefulness to the 
community of users who would use this research 
tool . 

These principles also apply to transcript expression profiling 
techniques that antedate the development of high density 
spotted microarrays, and accordingly were well-understood 
prior to 1995. 

11. Moreover, expression profiling is not limited 
to the measurement of mRNA transcript levels. It is widely 
understood among molecular and cellular biologists that 
protein expression levels provide complementary profiles for 
any given cell and cellular state. Although I cannot claim 
credit for having coined the phrase, I have written that the 
difference between transcript expression profiling and protein 
expression profiling is that n transcriptomics indicates what 
should happen, and proteomics shows what is happening. " 5 

12. For decades, such protein expression profiles 
have been generated using two dimensional polyacrylamide gel 



Rockett, "Macroresults through Microarrays," Drug Discovery Today 
7:804 - 805 (2002) (emphasis added), attached hereto as Exhibit K. 



electrophoresis (2D- PAGE) , and used, among ot berthings, to 
study drug effects.' 

13 . Although the protein expression profiles 
produced by 2D- PAGE analysis are analogous to the transcript 
expression profiles provided by nucleic acid microarrays, an 
even closer analogy is perhaps offered by antibody 
microarrays; as I note in my Drug Discovery Today commentary, 
such antibody microarrays date back to the work of Roger Ekins 
in the mid- to late-1980s. 7 

14. The principles in paragraph 10 also apply to 
protein expression profiling analyses, particularly to 
analyses performed using antibody microarrays. Thus, as with 
nucleic acid microarrays, the greater the number of proteins 
detectable, the greater the power of the technique; the 
absence or failure of a protein to change in expression levels 
does not diminish the usefulness of the method; and prior 
knowledge of the biological function of the protein is not 
required. As applied to protein expression profiling, these 
principles have been well understood since at least as early 
as the 1980s. 

15. Both gene and protein expression profiling are 
particularly useful to the toxicologist , especially in the 
pharmaceutical industry. Accordingly, I made the following 



See, e.g., Anderson et al., "A two-dimensional gel database of rat 
liver proteins useful in gene regulation and drug effects studies," 
Electrophoresis 12:907 - 930 (1991), attached hereto as Exhibit L. 

7 See Ekins et al., J . Bioluminescence Chemi luminescence 5:59-78 

(1989); Ekins et al., Clin. Chem. 37: 1955-1965 (1991); and Ekins, U.S. 
Patent Nos. 5,432,099, 5,807,755, and 5,837,551, attached hereto 
respectively as Exhibits M to Q. 



'Wta^emmte^xn'^'x&iobiotica review, written in_the summer of 
1998: 

[I]n the field of chemical-induced 
toxicity, it is now becoming increasingly obvious 
that most adverse reactions to drugs and chemicals 
are the result of multiple gene regulation, some of 
which are causal and some of which are casually- 
related to the toxicological phenomenon per se. 
This observation has led to an upsurge in interest 
in gene-profiling technologies which differentiate 
between the control and toxin- treated gene pools in 1 
target tissues and is, therefore, of value in 
rationalizing the molecular mechanisms of 
xenobiotic-induced toxicity. 

Knowledge of toxin- dependent gene 
regulation in target tissues is not solely an 
academic pursuit as much interest has been 
generated in the pharmaceutical industry to harness 
this technology in the early identification of 
toxic drug candidates, thereby shortening the 
developmental process and contributing 
substantially to the safety assessment of new 
drugs . 

For example, if the gene profile in 
response to say a testicular toxin that has been 
well-characterized in vivo could be determined in 
the testis, then this profile would be 
representative of all new drug candidates which act 
via this specific molecular mechanism of toxicity, 
thereby providing a useful and coherent approach to 
the early detection of such toxicants. 

Whereas it would be informative to know 
the identity and functionality of all genes up/down 
regulated by such toxicants, this would appear a 
longer term goal, as the majority of human genes 
have not yet been sequenced, far less their 
functionality determined. However, the current use 
of gene profiling yields a pattern of gene changes 
for a xenobiotic of unknown toxicity which may be 
matched to that of well-characterized toxins, thus 
alerting th toxicologist to possible in vivo 
similarities between the unknown and the 
standard. . . . 



Despite the development of multiple 
technological advances which have recently brought 
the field of gene expression profiling to the 
forefront of molecular analysis, recognition of the 
importance of differential gene expression and 
characterization of differentially expressed genes 
has existed for many years. 



16. As noted in the preceding excerpt from my 
Xenobiotica review, expression profiling in toxicology studies 
yield patterns of changes that are characteristic of an agent 
of unknown toxicity, which patterns may usefully be matched to 
those of well-characterized toxins. 

17. In the context of such patterns of gene 
expression, each additional gene-specific probe provides an 
additional signal that could not otherwise have been detected, 
giving a more comprehensive, robust, higher resolution — and 
thus more useful — pattern than otherwise would have been 
possible.* 

18. It is my opinion, therefore, based on the state 
of the art in toxicology at least since the mid-1990s -- and 
as regards protein profiling, even earlier — that disclosure 
of the sequence of a new gene or protein, with or without 
knowledge of its biological function, would have been 



8 In a sense, each gene-specific probe used in such an analysis is 
analogous to a different one of the many parts of an engine, with each 
individual part, or subcombinations of such parts, deriving at least p; 
of their usefulness from the utility of the completed combination, the 
functioning engine. 



sufficient information for a toxicologist to use-feh gene 
and/or protein in expression profiling studies in toxicology. 



19. The statements made in this declaration 



represent my individual views and are not intended to 
represent the opinion of my employer, the United States 
Environmental Protection Agency, or of any other branch of the 
federal government. Other than my current engagement to 
provide this declaration, I have neither had, nor currently 
have, financial ties to, or financial interest in, Incyte 
Corporation. I am not myself an inventor on any patent 
application claiming a gene or gene fragment. 



herein of my own knowledge are true and that all statements 
made on information and belief are believed to be true, and 
further that these statements were made with the knowledge 
that willful false statements and the like so made are 
punishable by fine or imprisonment , or both, under 
Section 1001 of Title 18 of the United States Code and may 
jeopardize the validity of any patent application in which 
this declaration is filed or any patent that issues thereon. 



20. 



I declare further that all statements made 



John Coughlin Rockett III, Ph.D. 
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Name: 



John Coughlin Rockett HI 



Nationality: 



USA 



Work Address: United States Environmental Protection Agency 



National Health and Environmental Effects Research Laboratory 
Reproductive Toxicology Division (MD-72) 
Gamete and Early Embryo Biology Branch 
Research Triangle Park 
NC 27711 
USA 



Work Telephone: 



+001 (919) 541 2678 



Work Fax: 



+001 (919)541 4017 



Work E-mail: 



rockett.iohn@.epa. gov 



Employment and Higher Education 

CURRENT POSITION (12/00-present) 
Research Biologist 

Gamete and Early Embryo Biology Branch (MD-72) 
Reproductive Toxicology Division 

National Health and Environmental Effects Research Laboratory 

US Environmental Protection Agency 

Research Triangle Park 

NC 27711 

USA 



PREVIOUS POSITIONS 

8/98-12/00: NHEERL Post-Doctoral Research Fellow, Gamete and Early Embryo Biology 
Branch, Reproductive Toxicology Division, National Health and Environmental Effects 
Research Laboratory, United States Environmental Protection Agency, Research Triangle Park, 
NC, USA. 

Supervisors: Dr Sally P. Damey (Scientific publications under Sally D. Perreault) and Dr David 
J. Dix. 

5/95-7/98: Rhone-Poulenc Post-Doctoral Research Fellow, Molecular Toxicology Group, School 
of Biological Sciences, University of Surrey, Guildford, Surrey, England. 
Supervisor: Prof. G. Gordon Gibson. 



EDUCATION 

Ph.D., 1995 - University of Warwick, Coventry, W. Midlands, England 

Title: Transforming Growth Factor-0 and Immune Recognition Molecules in Oesophageal 

Cancer. 

Supervisors: Dr Alan G. Morris (University of Warwick) and Dr S. Jane Damton (Birmingham 
Heartlands Hospital) 

B.Sc. (Hons.), 1991 - University of Warwick, Coventry, W. Midlands, England. 

Degree: Microbiology and Microbial Technology (with intercalated year in industry), Class 2i. 

Tutor: Professor Howard Dalton. 
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- PROFESSIONAL ACTIVITIES _ 

Membership of Professional Societies: 

Society of Toxicology (Inc. Molecular Biology Speciality Section) (2001 -present) 
Science Advisory Board (2001 -present) 

North Carolina Chapter of the Society of Toxicology (1999-present) 

Triangle Consortium for Reproductive Biology (1999-present) 

Triangle Array Users Group ( 1 999-present) 

Institute of Biology (UK.) (1989 - present) 

British Toxicology Society (1996 - 2000) 

Biochemical Society (U.K.) (1992-1995) 

British Society for Immunology (1 992-1 995) 



Membership of Scientific Committees: 



International Life Sciences Institute's (ILSI) Health and Environmental Sciences Institute (HESI) 
Technical Committee on the Application of Genomics to Mechanism-Based Risk Assessment: 

• Steering Committee (5/02-present). 

• Hepatotoxicity Working Group Vice-Chair (5/02-present). 

• Hepatotoxicity Work Group Member (5/0 1 -present). 

Charter member, Fertility and Early Pregnancy Work Group of the National Children's Study 
(07/01-Present). 

National Health and Environmental Effects Research Laboratory Distinguished Lecture Series 
Committee (July 03-present). 

U.S. Environmental Protection Agency Genomics Task Force Microarray Technical 
Subcommittee (August 03-present). 

National Health and Environmental Effects Research Laboratory Genomics and Proteomics 
Committee (NGPC) (September 03-present). 



Professional Meetings: 



Invited participant ( Observer") in Expert Panel Workshop: "The Role of Environmental Factors 
on the Onset and Progression of Puberty in Children". Organised by Serono Symposia 
International. November 6-%*, 2003, Chicago, IL, USA. 



Joint organiser and co-chair of: "Genomic analysis of surrogate tissues for measuring toxic 
exposures and drug action", the "Innovations in Applied Toxicology" Symposium for the Society 
of Toxicology 42 Annual Meeting, March 9 th -13 th , 2003, Salt Lake City UT USA 



(8) John CTfcockett, David LEsdaile and G Gordon Gibson (1999). Differential gene expression 
in drug metabolism: practicalities, problems and potential. Xenobiotica, 29(7):655-691 
(7) MC Murphy, CN Brookes, JC Rockett, C Chapman, JA Lovegrove, BJ Gould, JW Wright and 
CM Williams (1999). The quantitation of lipoprotein lipase mRNA in biopsies of human adipose 
tissue, using the polymerase chain reaction, and the effect of increased consumption of n-3 
polyunsaturated fatty acids. European Journal of Clinical Nutrition, 53:441-447. 

(6) JC Rockett, DJ Esdaile and GG Gibson (1997). Molecular profiling of non-genotoxic 
carcinogenesis using differential display reverse transcription polymerase chain reaction (ddRT- 
PCR/ European Journal of Drug Metabolism & Pharmacokinetics 22(4):329-33. 

(5) Rockett, J., Larkin, K., Damton, S., Morris, A. and Matthews, H. (1 997). Five newly 
established oesophageal carcinoma cell lines: phenotypic and immunological characterisation 
British Journal of Cancer 75(2):258-263. 

(4) J C Rockett, S J Damton, J Crocker, H R Matthews and A G Morris (1996). Lymphocyte 
infiltration in oesophageal carcinoma: lack of correlation with MHC antigens, ICAM-1, and rumour 
stage and grade. Journal of Clinical Pathology 49 :264-267. 

(3) J C Rockett, S J Damton, J Crocker, H R Matthews and A G Morris (1995). Expression of HL- 
ABC and HLA-DR histocompatability antigens and intercellular adhesion molecule- 1 in 
oesophageal carcinoma. Journal of Clinical Pathology 48:539-44. 

(2) Salam M, Rockett J and Morris A (1995). The prevalence of different human papillomavirus 
types and p53 mutations m laryngeal carcinomas: is there a reciprocal relationship? European 
Journal of Surgical Oncology 21:290-296. 

(1) Salam M, Rockett J and Morris A (1995). General primer-mediated polymerase chain reaction 
for simultaneous detection and typing of HPV in laryngeal carcinomas. Clinical Otolaryngology 
20^84-88. 



(2) Articles Submitted To A Scientific Journal 

(4) John C. Rockett, Judith E. Schmid, Christopher J. Luft, J. Brian Garges, M. Stacey Ricci 
Pasquale Patnzio, Norman B. Hecht and David J. Dix. Gene Expression Patterns Associated with 
Infertility m Rodent and Human Models. * An invited submission* 

(3) Roger Ulrich, John C. Rockett, G. Gordon Gibson and Syril Pettit. Evaluating the Effects of 
Methapyrilene and Clofibrate on Hepatic Gene Expression: A Collaboration Between Laboratories 
and a Comparison of Platform and Analytical Approaches. 

(2) Valerie A Baker, Helen M Harries, Jeffrey F Waring, Roger Jolly, Angus de Souza, Judith E 
Schmid, Hong Ni, Roger Brown, Roger G Ulrich and John C. Rockett. Clofibrate-Induced Gene 
Expression Changes in Rat Liver: A Cross-Laboratory Analysis Using Membrane cDNA Arrays. 



• (1) David Miller, Corrado Spadafora, David Dix, Adrian Platts, John C Rocket* Stenh™ a 
Krawetz Nuclease d,gestion of spenn chromatin suggests a miSCteC^^^^ 

(3) Articles In Preparation For Submission To A Scientific Journal 

(3) Spearow J, DB Tully, JC Rockett and DJ Dix. Differential testicular gene expression in mouse 
strains sensitive and resistant to endocrine disruption by estrogen. ex P res si°n in mouse 

(2) Sally D. Perrault, John C. Rockett, Laura Fenster James if«n«. mj^a., x> w 

(1) ^Ctastopher Luft, Douglas B. Tully, John C. Rockett, Judith E. Schmid and 
David J. Dix Reproductive and genomic effects in testes from mice exposed to the water 
disinfectant byproduct bromochloroacetic acid 

(4) Book Chapters 

(4) John C Rockett. Gene Microarrays Applied to Reproductive Toxicology. In Cunningham 

S^ST.r ^TT^ 1 ™ 1 ™ ^ T ° XiCity ^ ™ e Hu ™» P-^ToZat 
Ireparation. M/f invited submission* 9 

(3) John C. Rockett and David J Dix. Gene Expression Networks. In Cooper (cd-uKhiefl- 
(2) John C. Rockett. The Future of Toxicogenomics. In Michael Burczynski (edV "An 

oesophageal c^inoma Peracchia A, Rosati R, Bonavina L, Bona S, Chella B Ss™, 
Advances m Dveases of the Esophagus. Bologna: Monduzzi Editore, pp45-49 (1 996) 

ffl Other Scientific Publications (Letters to Editors; Meeting Reports; Commentaries 

(11) John C. Rockett (2003). Probing the nature of microarray-based oligonucleotides n™, 
Discovery Today 8(»):389. (A Letter To The EdUor) M„ invld , n wJ°"" " ^ 

(10) John C. Rockett (2003). To confirm or not to confirm (microarray data) - that is the Question 
Drug Discovery T oday 8(8):343. (A Letter To The Editor) ""t, is the question. 
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(9B) Nazzarenc .Ballatori, James L. Boyer, and John C. Rockett. (2003), Exploiting Genome Data 
to Understand the Function, Regulation and Evolutionary Origins of Toxicologically Relevant 
Genes. Environ Health Perspect. 1 1 1 (6):87 1 -5. (A Meeting Report) 

(9A) Nazzareno Ballatori, James L. Boyer, and John C. Rockett. (2003). Exploiting Genome Data 
to Understand the Function, Regulation and Evolutionary Origins of Toxicologically Relevant 
Genes. EHP Toxicogenomics. 111(1 T):6 US. (A Meeting Report) 

(8) John C. Rockett (2002) Surrogate Tissue Analysis for Monitoring the Degree and Impact of 
Exposures m Agricultural Workers. AgBiotechNet, 4:1-7 November, ABN 100. 0 Review Article) 
* An invited submission* } ' 

mTMeeX^° 2) ' MaCr ° reSUltS MiCr03ITayS - ^Discovery Today, 7(15);804- 

(6) John C Rockett (2002). Chip, chip, array! Three chips for post-genomic research Drug 
Discovery Today, 7(8);458-459. (A Meeting Report) 8 

(5)3 ^n?;^T^° 2y Useof Genomic Data in Risk Assessment. Ge„omeBiology 3(4V 
R^) (htt P :// ge n O mehiolo P v.com/20n7nM/ rT ^ c /40n/, isOTiar H = n f Meeting 

(4) John C. Rockett (2001). Genomic and Proteomic Techniques Applied to Reproductive 
^fM^TeporT ® 4020 M ° 20 - 3 ^tt D :// g e nome h^1n P v ,com/2001/ ? yQ/ r . r ^ C Mn,n^ 

(3) John C. Rockett (2001). Chipping away at the mystery of drug responses The 
Pharmacogenomics Journal, 1(3);161-163. (A commentary) * An invited submit* 

?oxi c 0 olo e ^? hn C ^ D ,7t ^ (1 " 9) - U S - EPA W ° rksh °P : A PP lication of DN A arrays to 
Toxicology. Environmental Health Perspectives, 107(8):681-685. (A Meeting Report) 

(1) John C. Rockett III (1995). Immune recognition molecules and transforming growth factor 
0^b) a ° e$0P CanCCT ' ?h,D ' th6SiS ' UmVersity of Warwick ' Coventry, England.(/>/,.£. 



(6) Published Book, Paper and Website reviews 

(9) I John C. Rockett (2002). A report on the manuscript: Systemic RNAi in C. elegans requires the 

™?Z\T^T^™* Pr ° tein SJD - L WinSt ° n ^ Molodowitch C, Hunter CP. Science 2002 
295:2456-2459. Ge/io/weBiology, 3(7):reports0034 

http://genomebioln gy.com/2002/3/7/reporfR/nn-;4/ 



(8) John C. Rockett T2001). A report on the manuscript: Genetic rescue of an endangered mammal 
by cross-species nuclear transfer using post-mortem somatic cells. P Loi , et al Nat Biotechnol 
2001, 19:962-964. GcwomeBiology, 3(l):reports0006. 
(htto://genomebiology.com/2001/3/l/reports/0006A . 

(7) John C. Rockett (2001). A report on the manuscript: Molecular Classification of Human 
Carcinomas by Use of Gene Expression Signatures. A Su et al., Cancer Res. 2001 61:7388-7393. 
GenomeBiology, 3(l):reports0005. (http://genomebiologv.com/2001 /3/1 /reports/OOPSA 

(6) John C. Rockett (2001). A report on the manuscript: Genetic evidence for two species of 
elephant in Africa A Roca^t al., Science. 2001 Aug 24;293(5534): 1473-7. Ge/io/neBiology, 
2(12):reports0045. rhttp://www. genomebio1ogy.com/2001 /2/1 2/reports/004 < i/ 

(5) John C. Rockett (2001). A report on the manuscript: Extensive genetic polymorphism in the 
human CYP2B6 gene with impact on expression and function in human liver. T Lang et al 
Pharmacogenetics, 2001, 1 1(5):399-415. GenomeBiology, 2(12):reports0044. 
nittD://www.genomebio logv.com/20Ql/2/12/reports/nnA4A 

(4) John C. Rockett (2001). A report on the manuscript: Novel Human Testis-Specific cDNA- 
molecular Cloning, Expression and Immunological Effects of the Recombinant Protein. R 
Santhanam and R K Naz, Molecular Reproduction and Development 60:1-12 (2001). 
GenoweBiology, 2(1 l):reports0040. (http://genomebiologv.eom/200l/2/l l/rep nrt s /nn4nA 

(3) John C. Rockett (2001). A report on the website: BIND - The Biomolecular Interaction 
Network Database (http://www.hinrf r.aA GenoroeBiology, 2(9): reports201 1 . 
http://www.genomebiol ogy.eom/2001/2/9/rep orfs/701 1 / 

(2) John C. Rockett (2001). A report on the manuscript: Exploring the DNA-binding specificities 
of zinc fingers with DNA microarrays. ML Bulyk et al., Proc Natl Acad Sci USA 2001 , 98:7 1 58- 
7163. Ge/KWieBiology, 2(10): reports0032. (http://genomebio1ogy.eom/2001/2/l 0/rennrts/OO^A 

(1) J Rockett (1996). A Book Review on: "Cell Adhesion and Cancer" (Eds., Hogg N and Hart I) 
Clinical Molecular Pathology 49(1):M64. *An invited submitting* 



(7) Published Abstracts of Poster and Oral Presentations 

(17) Amber K. Goetz, Wenjun Bao, Judith E. Schmid, Carmen Wood, Hongzu Ren, Deborah S 
Best, Rachel N. Murrell, John C. Rockett, Michael G. Narotsky, Douglas C. Wolf,' Douglas B 
Tully, David J. Dix Gene Expression Profiling in Testis and Liver of Mice to Identify Modes of 
Action of Conazole Toxicities. Society of Toxicology 43 rd Annual Meeting, March 21 sl -25 th , 2004 
Baltimore, MD, USA. Toxicological Sciences. (Submitted) 

(16) Jane Gallagher, Theresa Lehman, Ramakrishna Modali, Scott Rhoney, Marien Clas, Jeff 
Inmon, John C. Rockett, David Dix, Cindy Mamay, Suzanne Fenton, Suzanne McMaster, Stan 
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Barone Jr.TauIine Mendola and Reeder Sams. Validation of Non-Invasive Biological Samples- 
Pilot Projects Relevant to the National Children Study. Society of Toxicology 43 ,? Annual Meetine. 
March 21 s, -25™ 2004, Baltimore, MD, USA. Toxicologic^ Sciences. (Submitted) 

(15) B.S. Pukazhenthi, J. C. Rockett, M. Ouyang, D.J. Dix, J.G. Howard, P. Georgopoulos W J J 
Welsh and D. E. Wildt Gene Expression In The Testis Of Normospermic Versus Teratospermic 
Domestic Cats Using Human cDNA Microarray Analyses. Society for the Study of Reproduction 
36 ^Annual Meeting, July 19*-22 nd , 2003, Cincinnati, OH, USA. Biology of Reproduction 68 (Supp 

(14) David J. Dix and John C. Rockett (2003). Genomic and Proteomic Analysis of Surrogate 
Tissues for Assessmg Toxic Exposures and Disease States. Innovation in Applied Toxicology 
symposium entitled "Genomic and Proteomic Analysis of Surrogate Tissues for Assessing Toxic 
Exposures and Disease States". Society of Toxicology 42 nd Annual Meeting, March 9 ,h -13 th 2003 
Salt Lake City, UT, USA. Toxicol ogical Sciences 72(S-1):276. 

(13) John C. Rockett, Chad R. Blystone, Amber K. Goetz, Rachel N. Murrell, Judith E Schmid 
and David J. Dix. (2003). Gene Expression Profiling Of Accessible Surrogate Tissues To Monitor 
Molecular Changes In Inaccessible Target Tissues Following Toxicant Exposure. Innovations in 
Apphed Toxicology Symposium entitled "Genomic and Proteomic Analysis of Surrogate Tissues 
for Assessing Toxic Exposures and Disease States". Society of Toxicology 42 nd Annual Meetine 
March 9*-13*, 2003, Salt Lake City, UT, USA. Topological Sciences 72(S-1):276. 

^nJ^lt 3 * B - TuUy ' L Christ °P her Luft » J ohn C. Rockett, Judy E. Schmid and David J Dix 
(2002). Effects on gene expression in testes from adult male mice exposed to the water disinfectant 

S^^T?^^ 6 f Cid ; Societ yf° r the Stud y of Reproduction 35 th Annual Meeting, July 
28-31, 2002, Baltimore, Maryland, USA. Biology of Reproduction 66 (Supp 1):223. 

i! J " E " Th0m P S0n ' John C Rocke tt, Judith E. Schmid, Robert J. Goodrich 

David Miller, G. Charles Ostermeier and Stephen A. Krawetz (2002). Testis and spermatazola RNA 
P* 0 " fertlle men - Society for the Study of Reproduction 35'" Annual Meeting July 28- 
31, 2002, Baltimore. Maryland, USA. Biology of Reproduction 66 (Supp 1 ) : 1 94. 

(10) Asa J. Oudes, John C. Rockett, David J. Dix and Kwan Hee Kim (2002). Identification of 
retmoic acid induced genes in mouse testis by cDNA microarray analysis. 2? h Annual Meeting of 
the American Society ofAndrology, 4/24-27/02. J. Andrology Supplement March/April. 

(9) John C. Rockett, Robert J. Kavlock, Christy Lambright, Louise G. Parks, Judith E. Schmid 
Vickie S. Wilson and David J. Dix (2002). Use of DNA arrays to monitor gene expression in blood 
and uterus from Long-Evans rats following 1 7-p-estradiol exposure - a new approach to 
biomonitoring endocrine disrupting chemicals using surrogate tissues. Topological Sciences 66H V 
Abstract No. 1388. v '' 

(8) David J. Dix and John C. Rockett (2002). Genomic analysis of the testicular toxicity of 
haloacetic acids. Platform presentation at the symposium, "Defining the cellular and molecular 



n 



- mechanisms of toxicant action hr the testis". Toxicological Science 66 (1): Abstract No.848. 

(7) JC Rockett, JC Luft, JB Garges and DJ Dix (2001). The reproductive effects of the water 
disinfectant byproduct bromochloroacetate on juvenile and adult male mice. Toxicoloeical 
Sciences, 60 (1):250. 6 

(6) Tarka DK, Klinefelter GR, Rockett JC, Suarez JD, Roberts NL and Rogers JM (2001 ) Effect 
of gestational expsore to ethane dimethane sulfonate (EDS), bromochloroacetic acid t (BCA) and 
molmate on reproductive function in CD-I male mice. Toxicological Sciences, 60 (1):250. 

(5) Garges JB, Rockett JC and Dix DJ (2001). Developmental and reproductive phenotype of mice 

Sh ° Ck Pr ° teinS (HSP7 ° S) - Toxic ° l °^ ScieLs, 66(1):3 8 T 
(4) D Dix, J Rockett, J Luft, J Garges, M Ricci, P Patrizio and N Hecht (2000) Using DNA 

microarrays to characterise gene expression in testes of fertile and infertile humans and mice 
Biology of Reproduction, 62 (s 1 );227. 

(3) J Luft, J B Garges, J Rockett and D Dix (2000). Male reproductive toxicity of 
bromochloroacetic acid m mice. Biology of Reproduction, 62 (sl);246. 

(2) Rockett, JC, Garges JB and Dix, DJ (2000). A single heat-shock of juvenile male mice causes 
a long-term decrease m fertihty and reduces embryo quality. Toxicological Sciences 54 (1):365. 

(1) JC Rockett, SJ Darnton, J Crocker, HR Matthews and AG Morris (1994) Major 
Histocompatability (MHC) class I and JJ and intercellular adhesion molecule (ICAM)-l expression 
m oesophageal carcinoma (OC). Immunology 83 (si ):64. expression 



(8) Invited Oral Presentations 



(10) John C. Rockett and Gary M Hellmann. To confirm or not to confirm (microarray data) - 
^i££1£™ Gen ° m,CS ^ Pr ° te0miCS C — ' S 

(9) John C Rockett. "Biomonitoring Toxicant Exposure and Effect Using Toxicogenomics and 
Surrogate Tissue Analysis . Seminar for Division of Epidemiology, Statistics and Prevention 
Research, National Institute of Child Health and Development, May 29 th , 2003, Rockville, MD, 

flvvlv' R0Cl 5 e o- l G ! n ° mic5 and 'Atomics: New Toxicity Testing". Platform presentation at 
GA USA Assessors Annual Conference, April 28 ,h - May 2 nd , 2003, Stone Mountain, 

(7) John C. Rockett Chad R. Blystone, Amber K. Goetz, Rachel N. Murrell, Judith E. Schmid and 
David J. Dix Gene Expression Profiling Of Accessible Surrogate Tissues To Monitor Molecular 
Changes in Inaccessible Target Tissues Following Toxicant Exposure." Platform presentation at 
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SoT 42" Annual Meeting symposium entitled "Genomic and Proteomic Analysis of Surrogate 
Tissues for Measuring Toxic Exposures and Drug Action", March 9*-13*, 2003, Salt Lake Citv 
UT.USA. * 

(6) John C. Rockett. "A Toxicogenomic Approach to Surrogate Tissue Analysis" . Seminar for 
Department of Environmental and Molecular Toxicology, North Carolina State University 
September 3 rd , 2002, Raleigh, NC, USA. 

(5) John C. Rockett. "Differential gene expression in toxicology: practicalities, problems and 
potential". Platform presentation at ? h Annual Mount Desert Island Biological Laboratory 
Environmental Health Sciences Symposium: Exploiting Genome Data to Understand the Function, 
Regulation and Evolutionary Origins ofToxicologically Relevant Genes, July 10 ,h -l 1 * 2002, 
Salisbury Cove, Maine, USA. 

(4) John C. Rockett, Leroy Folmar, Michael J. Hemmer and David J. Dix. "Arrays for 
biomonitoring environmental and reproductive toxicology". Platform Presentation at Macroresults 
Through Microarrays 3 - Advancing Drug Development, April 29 th -May 1 st , 2002, Boston MA 
USA. 

(3) John C. Rockett, Sigmund Degitz, Suzanne E. Fenton, Leroy Folmar, Michael J. Hemmer, Joe 
E Tietge, and David J. Dix. "Use of DNA Arrays in Environmental Toxicology". Platform 
presentation at the 4* Annual Lab-on-a-Chip and Microarrays for Post-Genomic Applications 
meeting, January 14 -16* 2002, Zurich, Switzerland. 

(2) John C. Rockett. "DNA Arrays". Seminar at EPA Molecular Biology Course April 8 th 1999 
USEPA,RTP,NC,USA. 

(1) John C. Rockett "Contract Services for Array Applications". Seminar at the Triangle Array 
Users Group, May 1 st , 1999, CUT, RTP, NC, USA. 



(9) Other Poster and Oral Presentations 

(23) John C. Rockett, Wenjun Bao, Chad R. Blystone, Amber K. Goetz, Rachel N. Murrell, 
Hongzu Ren, Judith E. Schmid, Jessica Stapelfeldt, Lillian F. Strader, Kary E. Thompson and David 
J. Dix. Genomic Analysis of Surrogate Tissues for Assessing Environmental Exposures and Future 
Disease States. ILSI-HESI meeting: Toxicogenomics in Risk Assessment - Assessing the Utility, 
Challenges, and Next Steps. June 5*-6* 2003, Fairfax, VA, USA. 

(22) John C. Rockett, Wenjun Bao, Chad R. Blystone, Amber K. Goetz , Rachel N. Murrell, 
Hongzu Ren, Judith E. Schmid, Jessica Stapelfeldt, Lillian F. Strader, Kary E. Thompson and David 
J. Dix. Genomic Analysis of Surrogate Tissues for Assessing Environmental Exposures and Future 
Disease States. EPA Science Forum, May 5 th -7 ,h , 2003, Washington, D.C., USA. 
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(21) Germaine Buck, Courtney Johnson, Joseph Stanford, Anne Sweeney, Laura Schieve, John 
Rockett, Sherry Selevan and Steve Schrader. Prospective Pregnancy Study Designs for Assessing 
Reproductive and Developmental Toxicants. American Epidemiology Society Meeting, March 27 - 
28™ 2003, Atlanta, GA, USA. 

(20) John C. Rockett, Chad R. Blystone, Amber K. Goetz, Rachel N. Murrell, Hongzu Ren, Judith 
E. Schmid, Jessica Stapelfeldt, Lillian F. Strader, Kary E. Thompson, Doug B. Tully, Paul Zigas 
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_ y — - " — ****** warm 

Si*" 1 ** * Atfip fist shows homology «*, hOe 
(M). To date tie con si sts S7E23 sequence and 
crwpttantfMstjBOnuttbn. pclynarochah 
"Motion (PC R) prim ers <5'-TCGGAAGACCTCAT- 
1U I UJ I CATTTTGATATTCCTG- TGTAGATTG- 
TAC7GAGAG7GCAC^;«5'-GCTACAAACAGC. 
GTCQA CTTGM TGOOCQGACATCTTCQACTGT- 
GCGGTATTTCACAtXG-30 wore used id W 
the (MS asquance of pRS3.6, and ff» ration 
protomtvislorrTwdktDyeasttoont-stepg^ 
n aj aj t a n a n FL Rothflvv MtKhodS avymol 104 
281 (1991J. To create the **/ ^*.-L£U? rruttfcn corv' 
taned on pi 14, ■ SXVkb Sat I .ragman* from pAXtr 
*»» cloned Ho pUCia and on intern* 4.0-fct> Hoa 
W0» I fragment hbs njpJaoed wflfi a fragment. 
Jocortrtua the tttffl^iajB aaete (a deattcn eor- 
rssponcing to 031 amino acids} canted on pi 53. a 
LBJ2 fragment was used to reptace the 2^*b Prrt 
*-Ecfl36 1 tagrnant of S7E23, wfiicn occurs within a 

PSP72 (Frontage). To create YtpMFAl, a 1.6-kfa 
Bam H fragment oontanng 7, torn pKKi6 (K. 

«■ E. Sterna, J. Thcrner. EMBOd a, 3973 
f?8e9B. was igatad into the Bam HI site of V&351 Lj 

24. J. Cham and L Her*owttz, Caff 65. 1203 0991). 

25. 8. W. Matthews, Ax. Cham. Ass. 21. 333 (1886). 

26. K. Kucfaar, H. G. Dohlman, J. Trxxner; J.Cet&oL 
120. 1203 (1993); ft. Koing and C P. HoUcnberg, 
B4BOJ. 13. 3261 0994); a Bertcower. 0. Loayza. 
S. Mchaafi, MoL BioL Car 5. 1185 0994). 

27. A. Br** .nd j. R. Pringia. Proc. NstL Acad. So. 
US* «. 9976 (1969); J. Chant. K. Corrado. J. R 
£""9** L Herstowta. Cat 65. 1213 (1991); S. 
Powaa, £ Gonzales, T. Chrtaensen, J. Cobort, 0 
Break, ana,, p. 1225; H. O. Park. J. Chant, I. Her. 
akowttz, Atorure 365. 269 (1993); J. Chant. Tanas 

Genet 10. 328 (1994); and J. R Pringia J 

OtBioi 129, 751 (1995); JL Chant M. Msctte £ 
Wtchefl. L Herafcowttt. J. R Pringia. bkS.. p. 767 
<V F - Sprague Atfathods. firsynx* 194, 77 
(1991). 

Single letter afibraMBtions for the amino aod ra&W 
OuBsaraask3iow5:A.AJa:C,Cy3; D. Aap; F Out F 
?* : K ■* * L. Leu; M. Mat; N ' 
Asa P. Pro; Q, Gh; R Arg; S, Sen T. Thr; V, VnJ; W 
TrpiandY.Tyr. 
30. AW303 1A derivatKre, SY2625 (M47a in3-r 

t*3&rJVSl-HS3l. was tiepvant strah tor »w mutant 
aean ^SY262 Sdarwa»aatar»iernal^«^^ 
cnHadph«uiu»aaMy»,aT>d«TopUB«K»»»q»- 
nwta nobded tie tatowkig atrars: Y49 (rta2?.T), 
V115 (majJA.ia«. Y142 (n/J.vC«43), ri73 
WLXBJSU Y220U^rOW«B23A.^R4J). Y221 
(«a23Ar.tR43». Y231 (a*JA:lHJ(? **23a^£LB). 
andms (sta23a.1fljej. M47o danvatiwas erf 
|J2fi5 hdudad tha tolowng strains: Y199 
(SY2625 made A447aX Y276 (tra22-7). Y195 
f 1 ?'^^). Y196 to/1t::LBJ2L and VI 97 
(aitf7.vWM3). The 6G123 (M47atei?tfa3 trpl ani 

strains tar analysis of bud cfte aalaction. EQ1230*. 
rtvatwas indudad the totowtag stnuns: Y175 
WLrLBJZl, Y223 («fl^«4J). Y234 frttfSA^ 
and Y272 (wrffa^fUt? »(a23^;Uua. 
W4To dertva&vn of EG 123 ndudad Via foiowv* 
stTm: V214 (EG123 made M47o) and Y203 
(sxff A.7L£U?). AS am wara genraad by means 
of standard genetic or molecular methods frM*v*tt 
lha appropriate conttructi (23). in panicutar, the ax/r 
«f«23 double mutant atraht w«e cnatad byotaa. 
ing of the appropnste Ai47a sta23 and A44Ta txti 
mutants, followed by sporutation of tha ramajntolp. 
bid and sciabon of the double rnutant tiom nonpa. 
«ntt di-typa tetrads. Gene dtarupdons were oorv 
wmad with either PCfi or Southern (DNA) analysis. 
p129 is a YEp352 (X E. Ha\ A> M Myaa, T. JL Ko- 
ernar,A.TagololI, Ymstt 163n9B6Bpterr*JoBn- 
ttnrig a S^-kto Sal I sagmant of pWl, p15l was 
derived torn pi29 by rvartion of a Inker at theBpfl 
sitewithhAW.;, which tod to an WnmrBertoTof 
5whernaggUWn(H^epiicpaTO^^ 
between amino adds 654 and 655 rftheAXI/cfod- 



wareaea^w<mfTeusertpC2258rxJ 

Gg5^ACOGGC-3'; «r;^7 M , V^SaATCAT- 
^ATpATGT^TCACAAAGCT^ The 

carrying dtfavant AXL1 aWes. 0124 taxtl^RAn 

0132 

«eo ana- replacement of the pisi Bam w-Msc | 
I to generate pi 61 W-eriA), pi62 (axn- 
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glass were used for quaSLvT e^es^^ r S 9 d rt 0NA * °" 

Because of the small format and Wah d^£^^ 01 0)6 "^espondino oenes. 
microliters could be use^TenSl ' ^ ^Mlzition volumes of 2 

derived from 2 microgrWSf S1elia^ m L!l ^ S' nscri P t » *" P^obe mixtures 
measurements of 45 AL/dopS s Si s wl rnTd^fvT' Dif,orential «P«ssion 
fluorescence hybridization by means ^ simultaneous, two-color 
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The temporal, developmental, topographi- 
cal, histological, and physiological patterns 
m which a gene is expressed provide clues to 
us b«>logi«l xole. The large and expanding 
database of complementary DNA (cDNA) 
»*quences from many organisms ( / ) presents 
*e opportunity of defining these patterns at 
the level of the whole genome. 

For these studies, we used the small flow. 
enng plant Aroiidopju tWuma as a model 
organism. Araixdopsis possesses many ad- 
vanuges for gene expression analysis, in- 
cluding the face that it has the smallest 
genome of any higher eukaryote examined 
fJ 8 * H , - L1 Fo ^- five cl °ned Arofcidopsi, 
cDNA, (Table 1). including 14 complete 
*quence, and 31 expressed sequence rags 
(EHs). were used as gene-specific targets. 
We obamed the ESTs by selecting cDNA 
« random from an AroWopw 
.u Co ^7^- ^""ice analysis revealed 
that 28 of the 31 ESTs matched sequences 

£ScrwM«nB R. Davis. Deoamnen, oi Bocfwmstry 
|*"5«rt»**«B Msckcal trettute, B«c*nan Cent* 
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in the database (Table 1). Three additional 
cUN Aj from other organisms served as con- 
trols in the experiments. 

The 48 cDNAi, averaging ~1.0 kb 
were amplified with the polymerase chain 
reaction (PCR) « d deposited into inc£ 
vidua! wells of a 96-well microtiter plate 
Each sample wa, duplicated in two adia- 
cent well, to al ow the reprodueibiliry of 
the arraying and hybridiation procets to 
be tested. Samples from the microtiter 
plate were printed onto glass microscope 
slides in an area measuring 3.5 mm by 5^5 

rT c ht *m ^ C ° f ' hi « h -P«d arraying 
machine (3 ). The array, were proce»ed by 
chemical and heat treatment to attach the 
DNA sequences to the glass surface and 
denature them (3). Three array,. printed 
■n a smgle lot. were used for the experi- 
ments here. A single microtiter plate of 
n,R products provides sufficient material 
to print at least 500 arrays. 

Fluorescent probes were prepared from 
total Anfefapsii mRNA (4) by a ,ing| e 
£ tra ~cription (5). The Ar fl - 

butopw mRNA was supplemented with hu- 
man acetylcholine receptor (AChR) mRNA 
at a dilution of 1 : 1 0.000 ( w/w) before cDNA 
synthesis, to provide an internal standard for 
^™°r" The resulting fluorescently 
labeled cDNA mixture was hybridised to an 
anay at high stringency (6) and scanned 

««7 



-with a laser (3). A high-sensitivity scan gave " 
signals chat saturated die detector at nearly 
all of the Arabidopsis target sites (Fig. 1A). 
Calibration relative to the AChR mRNA 
standard (Rg. 1A) established a sensitivity 
limit of - 1 : 50,000. No detectable hybridiza- 
tion was observed to either the rat glucocor- 
ticoid receptor (Rg. 1A) or the yeast TRP4 
(Rg. 1A) targets even at the highest scan- 
ning sensitivity. A moderate-sensitivity scan 



of the same array allowed linear detection of 
the more abundant transcripts (Rg. IB). 
Quantitation of both scans revealed a range 
of expression levels spanning three orders of 
magriiride for the 45 genes rested (Table 2). 
RNA blots (7) for several genes (Rg. 2) 

^2° l Ced * e "J***" 00 levels measured 
with the microarray to within a factor of 5 
(Table 2). 

Differential gene expression was investi- 
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gated with-* simultaneous, twtxobr hv- 
briduation scheme, which served to mu£ 
mize expenmental variation inherent in the 
comparison of independent hybridizations, 
fluorescent probes were prepared from two 
mRNA sources with the use of reverse tran- 
scriptase in the presence of fluorescein- and 
lusamine-labeled nucleotide analogs re. 
actively (51 The two probes were 'then 
mixed together in equal proportions, hy. . 
bridued to a single array, and scanned sep- 
arately for fluorescein and lissamine emis- 
sion after independent excitation of the two 
fluorophores (3). 

To test whether overexpression of a sin- 
gle gene could be detected in a pool of total 
AwWop* mRNA, we used a microarray to • 
analyze a transgenic line overexpressing the 
single transcription factor HATA (8). Fluo- 
rescent probes representing mRNA from 
wiW-rype and HAT^transgenic plants were 
labeled with fluorescein and lissamine, re- 
spectively; the two probes were then mixed 
and hybridized to a single array. An intense 
hybridization signal was observed ar the 
position of the RATA cDNA in the lissa- 
mine-specific scan (Fig. ID), but not in the 
fluorescein-specific scan of the same array 
j?* j '* CaIibranon w «th AChR mRNA 
t0 ^ nuores «* and lissamine 
? in^"" reactions at dilutions of 
1:10,000 (Fig. 1C) and 1:100 (Fig. ID), 
respectively, revealed a 50-fold elevation of 
HATA mRNA in the transgenic line rela- 
te to its abundance in wild-rype plants 
(Table 2). This magnitude of HAT4 over- 
expression matched that inferred from the 
Northern (RNA) analysis within a factor of 
2 (Fig. 2 and Table 2). Expression of all the 
other genes monitored on the array differed 
by less than a factor of 5 between HAT4- 
transgenic and wild-type plants (Fig 1, C 



O O O t v 



* c * a 



•> c -5- « i-r 



WUdtyp* 



M 0.000 



FiQ. 1. Gene expression monrtwedwdh the use atcDMniaoarmTFkx*!*Z~«Z!^^^^^ 

with the use of town oonoema^olH^ACWn^^^^^f lmm obtained 
letters on the axes mark trie poeltto of eeshtfX^ Numbers and 

mooerate sansftvtty. (C arxl D) A srioJe a/rav ^nrSt^ . , , array as n (A) but scanned at 

then scanned successivery to detect the fluoresce ft r^r^~~S!z ? an 9 le array was 

Plants (Q and^rZ «uorSc^ mRNA ^ 

and F) A single array was probed with e iTrXerf ^ *™ » * 

lissamir»e-iabeJed cDNA from leaf tissue The sirwle arrav^!^^. , !l ^ root bssue 

fluorescein florescence conesocr&v o rfZtoJ^Z?- succesave ^ to detect the 
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Human 
AChR 



20 2J0 <U 
mRNA (nQ) 

Fig. 2. Gene expression monitored with RNA 
(Northern) blot analysts. Designated erroirts of 
mRfvJA from wild-type and K^T4 transgenic 
plants were spotted onto nylon membranes and 
probed with the cDNAs indicated. Purified human 
AChR mRNA was used for calibration. 



and D, and Table 2). Hvbridiiation nf flit. «.DVTa ... 



and D, and Table 2). Hybridation of flu- 
orescein-labeied glucocorticoid receptor 

™£ iS 8 ; )P and ^nunclabeled 
TRP4 cDNA (Fig. ID) verified the pre*, 
ence of the negative control targets and the 
jack of optical cross talk between the two 
"uorophores. 

To explore a more complex alteration in 
expression patterns, we performed a second 
two-color hybridization experiment with 
fluorescein- and liisamine-labeled probes 
prepared from root and lea/mRNA, respeo 
rively. The scanning sensitivities for the 
two fluorophores were normalized by 
matching the signals resulting from AChR 



mRNA, which was added to both cDNA 
lynthesu reactions at a dilution of 1:1000 
(fig. 1 , E and F). A comparison of the scans 
revealed widespread differences in gene ex- 
predion between root and leaf tissue (Fig. 1, 

lated CABlgtnc was -500-fold more abun- 

**mt m *' IF) tha " * «« ^ 

A U j l cxpression of 26 other genes 
differed between root and leaf tissue by 

^tr^ ° f 5 < F * J . E »«d F) 

J he HAT4.transgenic line we examined 
has elongated hypocoryls, early flowering, 
P^r germination and altered pigmentation 
W. Although changes in expression were 
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AChR 
EST3 
EST6 
AAC1 
EST12 
EST13 
CAB! 
EST17 
G44 
EST 19 
G8F-7 
ES723 
EST29 
GBF-2 
EST34 
EST35 
EST41 
rGR 
EST42 
EST45 
HAT1 
EST46 
EST49 
HAT2 
HAT 4 
EST50 
HATS 
EST51 
HA 722 
ESTS2 
EST59 
KNAT1 
EST60 
EST69 
PPH1 
EST70 
EST75 
EST 78 
ROCI 
EST82 
EST83 
EST84 
EST91 
EST96 
SAR1 
EST100 
EST103 
TRP4 



Human AChR 
Actin 

NADH oertyoVogenase 
Actin 1 
Unknown 
Actin 

Chlorophyll a/b binding 
PhcsphogJycerate kinase 
Gbbereliic acid biosynthesis 
Unknown 

G-box binding factor 1 
Bongaoon factor 
Aldolase 

G-box binding factor 2 
Chloropiast protease 
Unknown 
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Rat pAcocorticoid receptor 

Unknown 

ATPase 

Homeo be* -leucine apper 1 
Light harvesting complex 
Unknown 

Homeobox -leucine zipper 2 
Homeobox-leucine zipper 4 
Phosprxanbutokinase 
Homeobcw-leuane zippers 
Unknown 

Homeobox-leucine *Pper 22 
Oxygen evolving 
LWoiown 

Knofred-fcke homeobox 1 
RuBisCO small suburvt 
Translation elongation factor 
Protein phosphatase 1 
Unknown 

Chtorcpiast protease 
Unknown 
OyctophOin 
GTP binding 
Unknown 
Unknown 
Lhknown 
U*nown 
Synaptobrevin 
Light harvesting complex 
Light harvesting complex 
Yeast tryptophan biosynthesis 
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T44621 
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observed for HAT*, large change, to «. 

^ r T 10 l wcn: oi * etved *» «r of the 
other 44 gene, we examined. TniTw^ 
somewhat wrpruing. particularly because 
comparative analy* of leaf and root time 
•fcnofad 27 differentuljy express jSef 
Amlysu of an expanded *t of gene, rrV£ 
requtred to identify gene, whose express** 
Ranges upon HAT4 overexpression; aire? 
natively, a comparaon of mRNA copula- 
uons torn Jpectfic ^ ^ 

fir£t' tn ?? et>iC planB allow ^enr.- 
ncation of downstream genes. 

ir i^^^^^ity of robotic printing, 
« is feasible to scale up the fabrication pn> 
r^yf produce ""ay* containing 201X10 
wol ^ -^^TTSie array 
would be sufficient to provide gene„pecific 
targets encompassing nearly die entire rep- 
ertoire of expressed genes in the Arobidop* 
£*ome (2) The availability of 20,274^ 
from Arobidopsi, (i, 9) would provide a rich 
source of templates for such studies 

The estimated 100,000 gene, in the hu- 
"angenome (10) exceeds the number of 
AroWppw genes by a factor of 5 (2). This 
modest increase in complexity suggests that 
similar cDNA microarrays, pre^red 

OTs (1), could be used to determine the 
expression patterns of tens of thouands of 

a„ U, ?m ^ diVCOe «" ^ Coupling 
an ampltficaoon strategy to the reverb 
ransenpnon reaction (ii) could make it 
feasible to monitor expression even in 
mmute tissue sample. A wide variety of 
acute and chronic physiological and patho. 
logical conditions might lead to character- 
istic changes in the patterns of gene expres- 
sion in peripheral blood cells or other easily 
sampled tissues. In concert with cDNA mi- 
croarrays for monitoring complex expres- 
sion patterns, these tiaue. might therefore 
serve as sensitive in vivo seruors for clinical 
diagnosis. Microarrays of cDNAs could thus 
prov.de a useful link between human gene 
sequences and clinical medicine. 

J ah>e ^S*** expression momtomg by rrtcroar. 
See Table 1 tor additional gene ^tomatorfS- 

oltowwn amounts of human AChR mRNA. values 
for the mcroarrBy were Determined from rrsbW 
ray scans (Rg. 1); values Ir, the RNM^Z 
oetemwed from Rna blots F« 2) 
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McroarrBy 


RMAbtot 


CABi 
CABi (tg) 
HAT 4 
HAT 4 (tg) 
ROCI 
ROCI (tg) 


1:48 

1:120 

1:8300 

1:150 

1:1200 

1560 


1:83 

1:150 

1:6300 

1:210 

1:1800 

1:1300 
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Gene Therapy in Peripheral Blood 

^ phocytes and Bone Marrow for 
ADA" Immunodeficient Patients 

fiiiiiiS U S° B ? r 5^ 0n A Lui9i D * Notarangelo, Nadia Nobili, 

Daniela Maggioni, Claudia Rossi, Paolo Servida 
Alberto G. Ugazio, Fulvio Mavilio 

used to transfer ex vivo the h..™« AnTli • • d,fler ent retroviral vectors were 
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Severe combined immunodeficiency asso- 
ciated with inherited deficiency of ADA 
(i) is usually fetal unless affected children 
are kept in protective isolation or the im- 
mune system is reconstituted by bone mar- 
row transplantation from a human leuko- 
cyte antigen (HLAMdentical sibling donor 
W. This is the therapy of choice, although 
it is available only for a minority of patients. 
In recent yean, other forms of therapy have 
been devebped, including transplants from 
hapioidentical donors (3. 41 exogenous en. 
ryme replacement (5), and so ma tic -cell 
gene therapy (6-9). 

We^previousiy reported a preclinical mod- 
el in which ADA gene transfer and expression 

h. Servida, F. MawJo, Tdetfion Gene Trwrapy Program 
torGenetjc Dbeeses. OtSTT, btituto Sdentifco H. S. Raf- 
tseie, Mian, Italy. 
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Bresca, Defy. 
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•nrtcoK S. Raftaete. Mian. Italy. 
P. Panina, Roche Mteno Ricerche. MUan . Italy. 
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successfully restored immune functions in hu- 
man ADA-deficient (ADA") r*riphaal 
blood lymphocytes (PBLs) in inurumodefi- 
cient mice in vivo (JO, J]J. On the basis of 
these preclinical results, the clinical appiica. 
^^jjF?* for the treatment of 

ADA SCID (severe combined irnmimodefi- 
ciency disease) patients who previously failed 
exogenous eniyme replacement therapy was 
approved by our Institutional Ethical Com- 
mittees and by the Italian National Commit, 
tec for Bioethics (121 In addition to evaluat- 
ing the safety and efficacy of the gene therapy 
procedure, the aim of the study was to define 
the relative role of PBLs and hematopoietic 
stem cells in the long-term reconstiturioa of 
immune functions after retroviral vector-me- 
diated ADA gene transfer. For this purpose, 
two structurally identical vectors expressing 
■ the human ADA complementary DNA 
. (cDNA), distinguishable by the presence of 
alternative restriction sites in a nonfunctional 
region of the viral long-terminal repeat 
(LTR), were used to transduce PBLs and bone 
marrow (BM) cells independently. This pro- 
cedure allowed identification of the origin of 
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Introduction 



neonlLT PP3rent develo P ment °* al™>st all cancers and many non- 

Zn^ I 3C ?° inpanied ^ ahered gene express.cn in the affeaed cells 

compared to , *«r normal state (Hunter 1991, Wvnford -Thomas 1991, VogdS 

Tuch ^ , 3 ' SemCnZa 1 " 4 > Cassid y 1 995 ' * » ^d Van Hegn ing^ 1998? 

Such changes also occur m response to external stimuli such as pathogenTmicro 

!S» ^ wen (S TV' f * 1995 ' D ° gra * ^ 1998 ' Ramana »d Kohl 

1V98), as well as dunng the development of undifferentiated cells (Hecht 1998 

Rudm and Thompson 1998, Schneider-Maunoury et al. 1998). The potLtiai 

meted and therapeutic benefits of understanding the molecular change 

occur m any given eel in progressing from the normal to the 'altered' TtaTe re 

enormous. Such profiling essent ially provides a 'fingerprint' of each step 0 f a 

Effect! Tr ^jSS^^S^ITI" 1 ft^ N n at,0nal Heakh «« Environmental 
USA. moratory, Keproductive Toxicology Division, Research Triangle Park, NC 2771 1, 

t Rhone-Poulenc Agrochemicals, Toxic logy Department, Sophia-Antipolis, Nice, France 
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cell s development or response and should help in the elucidation of specific and 

.._-^*ye>om^ 

exposure to certain classes of chemicals that are enzyme inducers 

In drug metabolism, many of the xenobiotic-metabolizing enzymes rtncludino 
*e wd, charactered isoforms of cytochrome P450) are Su™ ££13 
chemicals m man (Pelkonen et ol. 1998), predominantly involving trlscriptioTaJ 
activation of not only the cognate cytochrome P450 genes, but SStaiSS* 
proteins which may be crucial to the phenomenon of induction. Acc^rdLh Z 
development of methodology to identify and assess the full comple^nt of Renes 
tfiat are either up- or down-regulated by inducers are crucial in thVdTvdopmentTf 
knowledge to understand the precise molecular mechanisms of enzyme Skn 
and how this relates to drug action. Similarly, in the field of cSll inS 
toxicity, it is now becoming increasingly obvious that most «lvS2«SSt 
drugs and chemicals are the result of multiple gene regulation, some o7whkh a e 
causal and some of which are casually-related to the topological phenomenon Z 
^This observation has led to an upsurge in interest m V^^^^SL 
which drfTerentiate between the control and toxm-tre^g^e^^^SS 
and is therefore, of value m rationalizing the molecular mechanisn^orxLobTotir 
induced toxiaty. Knowledge of toxin-dependent gene regulation i^targetTssues 
not solely an academic pursuit as much interest hafbeen genemedT *e 
pharmaceutical industry to harness this technology in the early identSon oftoxic 
drug candidates thereby shortenmg the developmental process and^onmbutrrL 
substantially to the safety assessment of new drugs. For example, if the ge^pronk 
m response to say a testicular toxin that has been well-characteriz d I TJ^Z d be 
determined m the testis, then this profile would be representative of Tnew dru e 
candidates which act via this specific molecular mechanism of toxicrty Xre"? 

Whereas it would be informative to know the identity and functionality of all eenes 
up/down regulated by such toxicants, this would appear a longer term goal ! £S 

iT" 8Cne l haVe bCen SeqUenCed > far le - Lnaltty 

ch?n^ f T er - C r em USC ° f gene P rofilin i * Pattern of ge£ 

changes for a xenobiotK of unknown toxicity which may be matched to thT t of weU- 
charactenzed toxins, thus alerting the Geologist to P o SS1 ble in t^£££L 
between the unknown and the standard, thereby proving a platformTc m « 
extensive topological examination. Such approaches are beginn^g to *Z 
m ^ ^ b , iotech -^ company are commeL^^K 
T °K r 8 T- th3t may b£ inte "ogated for toxicity asses ment 

xenobiotics. These chips consist of hundreds/thousands of genes, some of which are 
degenerate in the sense that not all of the genes are mechanistic^-re^ 0 an y 
one toxicological phenomenoa Whereas these chips are useful in Lad-sp7«rum 
screening, they are maturing at a substantial rate, m that gene arrays are "ow 
becoming more specific, e.g. chips for the identification of changes m growth facto" 

^ISas " C ° ntnbUte t0 aeti ° l0gy 3nd devdt * m « 0f chemiLCducS 

fo^Sh d t ° C ™ eMi ? and explaining these genetic changes presents a 
formidable obstacle to understanding the different mechanisms of development and 

ctaltl P Tr^^^^ 

devilo^ wit Se r al drfferential ™P™™ analysis' methods have been 
developed which facilitate the identification of gene products that demonstra* 
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ex P re 5 lon "^" s of one population compared to another. These methods 
have been used to identify differential gene expression in many scions, tnc ud ng 
invading pa hogemc microbes (Zhao et al. 1 998), in cells responding to extracellular 
and intracellular microbial invasion (Diiguid and Dinauer 1990, Raeno et al 1 997 
MaldareUi et al 1998), in chemically treated cells (Syed et al. 1997 Rockett et al 
1999), neoplastic cells (Liang et al. 1992, Chang and TerLgh^Howe l^st 
activated cells (Gurskaya et al. 1996, Wan et al. 1996), differentiated cells (Hara et 
al. 1991, Guimaraes et al. 1995a, b), and different cell types (Davis et al 1984 
Hednck et al. 1984 Xhu et al. 1998). Although different l ? r*ULV2y^ 
technologies are applicable to a broad range of models, perhaps their most impomn 
advantage is that ,n most cases, absolutely no prior knowledge of the specific genes 
which are up- or down-regulated is required. 

The field of differential expression analysis is a large and complex one, with 
many techniques available to the potential user. These can be categorized into 
several methodological approaches, including: 

(1) Differential screening, 

(2) Subtracts hybridization (SH) (includes methods such as chemical cross- 
linking subtraction-CCLS, suppression-PCR subtractive hybridization- 
bbH, and representational difference analysis— RDA) 

(3) Differential display (DD), 

(4) Restriction endonuclease 'facilitated analysis (including serial analysis of gene 
express,on-SAGE-and gene expression fingerprinting-GEF) 

(5) Gene expression arrays, and 

(6) Expressed sequence tag (EST) analysis. 

The above approaches have been used successfully to isolate differentially 

sXle S r a / CneS m Ifferent m ° del SyStCmS - H0WeVer ' each meth °d ^s its own 
subtle (and sometimes not so subtle) characteristics which incur various advantages 

and disadvantages. Accordingly, it is the purpose of this review to clarify th 
mechanistic pnnciples underlying the main differential expression methods and to 
highlight some of the broader considerations and implications of this very powerful 
and mcreasmgly popular technique. Specifically, we will concentrate on the so- 
called open systems namely those which do not reqmre any knowledge of gene 
sequences and, therefore, are useful for isolating unknown genes. Two 'closed' 

use oTdnT, * identified *™ "quences), EST analysis and the 

use of DMA arrays, will also be considered briefly for completeness. Whilst 
emphasis will often be placed on suppression PGR subtractive hybridization (SSH 
the approach employed ,n this laboratory), it is the aim of the authors to highlight 
wherever possible, those areas of common interest to those who use, or intend to use" 
differential gene expression analysis. 



Diff r ntial cDNA library scr ening (DS) 

Despite the development of multiple technological advances which have recently 
brought the field of gene expression profiling to the forefront of molecular analysis 
recognition of the importance of differential gene expression and characterization of 
differentially expressed genes has existed for many years. One of the original 

D^nWT* 0 T H f enCS W3S d6SCribed 20 ^ arS a *° ^ St J°hn and 

Davis (1979). These authors developed a method, termed 'differential plaque filter 
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hybridization', which was used to isolate galactose-inducible DNA sequences from 
-yeast The theory is- simpler a genomic DNA library is prepared 

unstimulated cells of the test organism/tissue J'n^^Z^t 
prepared. These rephca blots are probed with radioactively (or otherwise) labelled 

r°z r rna p ; • h : s pr r ed from the comro1 and test m^T^SS? 

Those mRNAs winch are deferentially expressed in the treated cell population wTu 
show a posatrve signal only on the filter probed with cDNA from the treated ceTk 
Funhermore labelled cDNA from different test conditions can be used to probe 

m W H H ' Creby enabHng thC identifica *°" of mRNAs which are only up 
regulated under certam conditions. For example, St John and Davis (1979) sc eened 
rephca filters w,th acetate- glucose- and galactose-derived probes in order to ob"^ 
genes mduced specmcally by galactose metabolism. Although groundbreaking in i" 
fme th,s method ,s now considered msensit,ve and time-consuming, as7p to 2 
months are requ.red to complete the identification of genes which are different ally 
expressed m the test population. In addition, there is no convenient w^to check 
that the procedure has worked until the whole process has been completed 

Subtractive Hybridization (SH) 

The developing concept of differential gene expression and the success of early 
approaches such as that described by St John and Davis (1979) soon gave ri e to a 
search for more conven.ent methods of analysis. One of the first to be developed wa 
SH, numerous vananons of which have since been reported (see below), n geneTaT 

to'excTs^m^^ of mRNA/cDNA from one population (tester) 

to excess mRNA/cDNA from another (driver), followed by separation of the 
unhybr^zed tester fract.on (differentially expressed) from the hybridized common 
sequences. Th„ step has been achieved physically, chenucally and through the Use 
of selective polymerase chain reaction (PCR) techniques. 

Physical separation 

n, 0 ^" 31 , 811 ^"^ 6 hybridization technology involved the physical separation 
o hybnd.zed common species from unique smgle stranded species. Several me hods 
of ach.evmg this have been described, including hvdroxyapatite chromatography 

Sargent and Daw,d 1983) avidin-biotin technology (Duguid and 0^ 990) 
and ohgodT-latex separate (Hara et al. 1991). In the first approach common 
mRNA speaes are removed by cDNA (from test cells)-mRNA (from com JcTlls) 
subtracts hybr^zauon followed by hydroxyapame chromatography, as hydroxy 
apatne specially adsorbs the cDNA-mRNA hybrids. The unabsorbed cDNA is 
then used e,ther for the construction of a cDNA library of differentially expressed 
genes (Sargent and Dawid 1983, Schneider et al. 1988) or directly as a probe to 

984? a A Pr " ekCted ^merman et al. 1 980, Dav.s et al. 1 984, Hedri k . til 

1984). A schemata d,agram of the procedure is shown in figure 1 

Less ngorous physical separation procedures coupled with sensitivity enhancing 
PCR steps were later developed as a means to overcome some of the problem! 

H 9 C 9°oTd e C H 6 h * d ;°*™> atite P™<*dure. For example, Dagu.d and Dbaue 
1990) descnbed a method of subtraction utilizing bioti„-.ffi»ity systems as a mean 
to remove hybr,d,zed common sequences. In this process, both the control .nd 
tester mRNA populations are first converted to cDN A and an adaptor ('ongove tor ' 
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Control (driver) mRNA 



-AAAA 



Tester (test) cONA (1st strand) 




Mix (ratio >35:1) & hybridize 



-AAAA 



AAAA 
AAAA 




Hydroxyapatite chromatography 

Unhybridized — TTTT 
cDNA (differentially expressed) 



RNAxDNA hybrids removed 



and mRNA 



■AAAA 
AAAA 



i 



Sepharose CL6B exclusion 
chromatography 



Small cDNA fragments (<450bp) 



Enriched, differentially expressed cDNA 



Produce clones Label directly and probe library 
Figure 1. The hydroxyapatite method of subtractive hybridization. cDNA derived from the 
treated /altered (tester) population is mixed with a large excess of mRNA from the control (driver) 
population. Following hybndization, mRNA-cDNA hybrids are removed by hydroxyapatite 
chromatography. The only cDNAs which remain are those which are differentially expressed in 
the treated/altered population. In order to facilitate the recovery of full length clones, small cDN A 
fragments are removed by exclusion chromatography. The remaining cDNAs are then cloned into 
a vector l« <m»g, or labelled and used directly to probe a library, as described by Sargent 
and Dawia (1983). 



containing a restriction site) ligated to both sides. Both populations are then 
amplified by PCR, but the driver cDNA population is subsequently digested with 
the adaptor-containing restriction endonuclease. This serves to cleave the oligo- 
vector and reduce the amplification potential of the control population. The digested 
control population is then biotinylated and an excess mixed with tester cDNA. 
Following denaturation and hybridization, the mix is applied to a biocytin column 
(streptavidin may also be used) to remove the control population, including 
heteroduplexes formed by annealing of common sequences from the tester 
population. The procedure is repeated several times following the addition of fresh 
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Control (driver) mRNA 

•*•••••••••• *'AAAA~ * " * 

AAAA 



Test (tester) mRNA 

-AAAA 
-AAAA 



| Anneal mRNA to polydTx latex beads 



^11 II 

cDNA synthesis 



I 



Mix and anneal 



AAAA- 



AAAA 



•to 



AAAA 



4 

Centrifuge beads, collect and store supernatant, 
dissociate polyA, reapply supernatant 



AAAA 



AAAA 



Tester-specific mRNA retrieved after 
4 rounds of hybridization 



cDNA synthesis 

4 

Ugate adaptors and insert into vector 

i 

Sequence inserts and/or carry out 
other downstream applications 

Figure 2. The use of oligodT^ latex to perform subtractive hybridization. mRNA extracted from the 
control (driver) population is converted to anchored cDNA using polydT oligonucleotides 
attached to latex beads. mRNA from the treated/altered (tester) population is repeatedly 
hybridized against an excess of the anchored driver cDNA. The final population of mRNA is 
tester specific and can be converted into cDNA for cloning and other downstream applications, as 
described by Hara et al. (1991). 
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. co^oL?pN^In.ordexto further enrich those speaies differentially expressed in 
the tester cDNA, the subtracted tester population is amplified byPCR following 
every sec nd subtraction cycle. After six cycles of subtraction (three reamplification 
steps) the reaction mix is ligated into a vector for further analysis. 

In a slightly different approach, Hara et al. (1991) utilized a method whereby 
oligofdT,,) primers attached to a latex substrate are used to first capture mRNA 
extracted from the control population. Following 1st strand cDNA synthesis, the 
RNA strand of the heteroduplexes is removed by heat denaturation and centri- 
fugation (the cDNA-ohgotex-dT, forms a pellet and the supernatant is removed). 
A quantity of tester mRNA is then repeatedly hybridized to the immobilized control 
(driver) cDNA (which is present in 20-fold excess). After several rounds of 
hybridization the only mRNA molecules left in the tester mRNA population are 
those which are not found in the driver cDNA-oligotex-dT*, population. These 
tester-specific mRNA species are then converted to cDNA and, following the 
addition of adaptor sequences, amplified by PCR. The PCR products are then 
ligated into a vector for further analysis using restriction sites incorporated into the 
PCR primers. A schemata illustration of this subtraction process is shown in figure 

However, all these methods utilising physical separation have been described as 
inefficient due to the requirement for large starting amounts of mRNA, significant 
loss of material during the separation process and a need for several rounds of 
hybridization Hence, new methods of differential expression analysis have recently 
been designed to eliminate these problems. 



Chemical Cross-Linking Subtraction (CCLS) 

In this technique, originally described by Hampson et al. (1992) driver mRNA 
is mixed with tester cDNA (1st strand only) in a ratio of > 20:1. The common 
sequences form cDNA:mRNA hybrids, leaving the tester specific species as single 
stranded cDNA. Instead of physically separating these hybrids, they are inactivated 
chemically usmg 2,5 diaziridinyl-l,4-benzoquinone (DZQ). Labelled probes are 
™7 thC remainin g sin 8k stranded cDNA species (unreacted 

mRNA species remaining from the driver are not converted into probe material due 
to specificity of Sequenase T7 DNA polymerase used to make the probe) and used 
to screen a cDNA library made from the tester cell populat.on. A schematic diagram 
of the system is shown in figure 3. 

, h !! Sh ° Wn th3t thC differentia % expressed sequences can be enriched at 
least 300-fold with one round of subtraction (Hampson et al. 1992) and that the 
technique should allow isolation of cDN As derived from transcripts that are present 
at less than 50 copies per cell. This equates to genes at the low end of intermediate 
abundance (see table 1). The main advantages of the CCLS approach are that it is 
rapid, technically simple and also produces fewer false positives than other 
differential expression analysis methods. However, like the physical separation 
protocols a major drawback with CCLS is the large amount of starting material 
required (at least 10 pg RNA). Consequently, the technique has recently been 
refined so that a renewable source of RNA can be generated. The degenerate random 
ohgonucleotide primed (DROP) adaptation (Hampson et al. 1996, Hampson and 
Hampson 1997) uses random hexanucleotide sequences to prime solid phase- 
synthesized cDNA. Since each primer includes a T7 polymerase promotor sequence 
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Control (driver) mRNA 



■^AAAA 
-AAAA 



mRNArcONA hybrids 



Test (tester) mRNA 



1st strand cDNA synthesis , 
followed by alkaline hydrolysis I 



Mix and anneal 



J 



-AAAA 



Unique cDNA species 



Cross linking agent 
(DZQ) added 



Hybrids are cross-linked 



xxxxxnxx 



AAAA 



Probes synthesised from single stranded cDNA 
species and used to probe cDNA library 



Table 1, The abundance of mRNA species and classes ,n a typical mammalian cell. 



mRNA 
class 



Copies of 

each 
species/cell 



No. of mRNA 
species in 
class 



Mean % of 
each species 
in class 



Mean mass 
(ng) of each 
species//ig 
total RNA 



Abundant 

Intermediate 

Rare 



12000 
300 
15 



4 

500 
11000 



3.3 

0.08 

0.004 



1.65 
0.04 
0.002 



Modified from Berti li et aL (1995). 
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at the 5'end, the final pool of random cDNA fragments is a PCR-renewable cDNA 
_po^lat,on ^luch ^.representative of the expressed* gene pool and can be used to 
synthesize sense RN A for use as driver material. Furthermore, iTthe final pool of 
random cDNA fragments is reamplified using biotinylated T7 primer and random 
hexamer the product can be captured with streptavidin beads and the antisense 
strand eluted for use as tester. Since both target and driver can be generated from 
the same DROP product, subtraction can be performed in both directions (i e for 
up- and down-regulated species) between two different DROP products. 

Representational Difference Analysis (RDA) 

■ ° f C f^ A (Hub3nk Sch3tZ 1994) is an extension of the technique 

onginally applied to genomic DNA as a means of identifying differences between 
two complex genomes (Lisitsyn et al. 1993). It is a process of subtraction and 
amplification involving subtracts hybridization of the tester in the presence of 
excess driver. Sequences in the tester that have homologues in the driver are 

Tit", K n3mP , / , 1' "2^" th ° Se BCneS CXpreSSed ° nl * in the retain the 

ability to be amphfied by PGR. The procedure is shown schematically in figure 4 

In essence, the dnver and tester mRNA populations are first converted to cDNA 
and amphfied by PGR followmg the ligation of an adaptor. The adaptors are then 
removed from both populations and a new (different) adaptor heated to the 
amphfied tester populate only. Driver and tester populations are next melted and 
7 T ratl ° ° f 100:1 - F ° U0Wing h ^"dization, only tester : tester 

L r b oth r end! 6 H Pt0 ^ " ° f DNA dUplCX 3nd C3n « thuS ' be filled 

in at both 3 ends^Hence, only these molecules are amplified exponentially durins 

the subsequent PGR step Although tester: driver heterohybrids are present, they 

only amphfy m a linear fashion, since the strand derived from the driver has no 

adaptor to which the primer can bind. Driver:driver heterohybrids have no 

adaptors and therefore, are not amplified. Single stranded molecules are digested 

with mung bean nuclease before a further PCR-enrichment of the tester tester 

homohybnds. The adaptors on the amplified tester populate are then re^ ed „d 

the whole process repeated a further two or three time, using an increasing excess of 

dn 8 0 e 000nf t 3 teSter:d " ver "tio of 1:400, 1:80000 and 

1^800000 for the second, th,rd and fourth hybridizations, respectively). Different 
adaptors are hgated to the tester between successive rounds of hvbridization and 
amplification to prevent the accumulation of PCR products that rmght interfere with 
subsequent amplifications. The final display is a series of differentially expressed 
gene products easily observable on an ethidium bromide gel 

The main advantages of RDA are that it offers a reproducible and sensitive 
approach to the analysis of differentially expressed genes. Hubank and Schatz (1994) 
reported that they were able to isolate genes that were differentially expressed in 
substantially less than 1 % of the cells from which the tester is derived. Perhaps the 
mam drawback is that multiple rounds of ligation, hybridization, amplifiation and 
digestion are required. The procedure is, therefore, lengthier than many other 
differential display approaches and provides more opportunity for operator-induced 
error to occur. Although the generation of false positives has been noted, this has 
been solved to some degree by O'Neill and Sinclair (1997) through the use of HPLC- 
punfied adaptors^ These are free of the truncated adaptors which appear to be a 
major source of the false positive bands. A very similar technique to RDA, termed 
linker capture subtraction (LCS) was described by Yang and Sytowski (1996) 
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^^'(^cDNA _ -dsaest (tester) cDNA 



i Digest with restriction enzyme J, 



I Ligate to I 

▼ dephosphorylated ^ 



12/24 adaptor 
strands 



J' Melt 12mer J, 



I Fill in 3" ends (Taq), add | 
* primer ( ) and J, 



amplify 



i Digest I Digest and ligate 

V new 12/24 adaptor 



Mix 100:1. melt and hybridize 



T 



I I | 

Fill in ends, add primer ( — ) and amplify 

J i i 

Linear amplification Exponential amplification No amplification 

i 



Digest PCR products with mung bean nuclease to 



ssDNA molecules presen 



after amplification 



remove 



First difference 

Figure 4. The representational difference analysis (RDA) technique. Driver and tester cDNA 

digested with a 4-cutter restriction enzyme such as DtmU. The 1- set of S/sITd. J«? . a 

populaoon, after which the tester is hybridized against a large excess TdrTver The S££ 
(1 993). and Hubank and I Schatz (1 994). dlffercn « P">d"«, as described by Lismyn et al. 
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Suppression PCR Subtractive Hybridization (SSH)+ 

The most" recent adaptation of the SH approach to differential expression 
analysis was first described by Diatchenko et al. (1996) and Gurskaya et al (1996) 
They reported that a 1000-5000 fold enrichment of rare cDNAs (equivalent to 
isolating mRNAs present at only a few copies per cell) can be obtained without the 
need for multiple hybridizations/subtractions. Instead of physical or chemical 
removal of the common sequences, a PCR-based suppression system is used (see 
figure 5). 

InSSH excess driver cDNA is added to two portions of the tester cDNA which 
have been hgated w.th different adaptors. A first round of hybridization serves to 
enrich differentially expressed genes and equalize rare and abundant messages 
Equalization occurs since reannealing is more rapid for abundant molecules than for 

^o e c71£[ eCU t0 thC SeC ° nd ° rder kinetics °f hybridization (James and Higgins 

1 985). The two primary hybridization mixes are then mixed together in the presence 
of excess driver and allowed to hybridize further. This step permits the annealing of 
single stranded complementary sequences which did not hybridize in the primary 
hybridization, and m doing so generates templates for PCR amplification. Although 
there are several possible combinations of the single stranded molecules present in 
the secondary hybridization mix, only one particular combination (differentially 
expressed in the tester cDNA composed of complimentary strands having different 
adaptors) can amplify exponentially. 

Having obtained the final differential display, two options are available if cloning 
of cDNAs is desired. One is to transform the whole of the final PCR reaction into 
competent cells. Transformed colonies can then be isolated and their inserts 
characterized by sequencing, restriction analysis or PCR. Alternatively, the final 
PCR products can be resolved on a gel and the individual bands excised, reamplified 
and cloned The first approach is technically simpler and less time consuming 
However, ligation/transformation reactions are known to be biased towards the 
cloning of smaller molecules, and so the final population of clones will probably not 
contain a representative selection of the larger products. In addition, although 
equalization theoretically occurs, observations in this laboratory suggest that this is 
by no means perfectly accomplished. Consequently, some gene species are present 
in a higher number than others and this will be represented in the final population 
of clones. Thus, in order to obtain a substantial proportion of those gene species that 
actually demonstrate differential expression in the tester population, the number of 
clones that will have to be screened after this step may be substantial. The second 
approach is initially more time consuming and technically demanding However it 
would appear to offer better prospects for cloning larger and low abundance gel 
products. In addition, one can incorporate a screening step that differentiates 
different products of different sequences but of the same size (HA-staining see 
later). In this way, a good idea of the final number of clones to be isolated'and 
identified can be achieved. 

An alternative (or even complementary) approach is to use the final differential 
display reaction to screen a cDNA library to isolate full length clones for further 
characterization, or a DNA array (see later) to quickly identify known genes SSH 
has been used in this laboratory to begin characterization of the short-term gene 
expression profiles of enzyme-inducers such as phenobarbital (Rockett et al 1997) 
and Wy-14,643 (Rockett et al. unpublished observations). The isolation of 
differentially expressed genes in this manner enabl es the construction of a fingerprint 
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Tester cDNA with adaptor 1 



Driver cONA 
(in excess) 



Tester cONA with adaptor 2 




Mix samples, add fresh denatured driver, anneal 



J 



a,b,c,d & e 




-ZZZL 



V77T 



Add primers and 
amplify by PCR 




a, d no amplification 

b no amplification • suppressed due to 

formation of panhandle structure 
c linear amplification 

e exponential amplification 

Figure 5 PCR-select cDN A subtraction. In the primary hybridization, an excess of driver cDNA is 
ft trlenl HtVZ P ° pulation - The sam P les are heat denatured and allowed to hybridize 

m t US, f « Ifr n CrVCS W ° ^ Urp0S " : (1 } 10 eqU3lize rare and abundant molecules ; and 
(2) to ennch for differential^ expressed sequences-cDNAs that are not differentially expressed 
form type c m lecules with the driver. In the secondary hybridization, the Jo p^ary 
hybridizations are mixed together without denaturing. Fresh denatured driver can also be added 
at this point to allow further enrichment of differentially expressed sequences. Type e molecules 
pr« 0 ?k. i "5 " " y h y bridi « tion whic " •« subsequently amplified using two rounds of 
hCR. The final products can be visualized on an agarose gel, labelled directly or cloned into a 

^/iooA dOW "? ream mani P ulation - As described by Diatchenko et al. (1996) and Gurskaya 
et al. (1VV0), with permission. J 



Differential gene expression 



667 



Control animab^ 



Treated animals 



Extract mRNA from 
tissue of interest 
e.g. liver 

i 



Extract mRNA from 
tissue of interest 
e.g. liver 



Dnase-treatment 



Dnase-treatment 



Convert to cDNA 



Complex probe for 
screening clones 



I 



Convert to cDNA 



Hybridization, subtraction and amplification 
«—f-Control driving tester for up-regulated genes 
Tester driving control for down-regulated genes 



Complex probe 
for screening 
clones 



Run out products on agarose gel 



Extract individual bands and clone in 
T/A vector 



Screen using standard 
and HA agarose 



PCR of 5-10 clone 
cultures per 
extracted band 



Different clones blotted 
and screened with up- 
regulated genes 



Screen using standard 
and HA agarose 



Plasmid mini-preps 
of selected clones 



Differentially expressed 
clones selected 



Sequencing and 
identification 



Different clones blotted 
and screened with down- 
regulated genes 



Figure 6. Flow diagram showing method used in this laboratory to isolate and identify clones of genes 
which are differentially expressed in rat liver following short term exposure to the enzyme 
inducers, phenobarbital and Wy-14,643. 

of expressed genes which are unique to each compound and time/dose point. Such 
information could be useful in short-term characterization of the toxic potential of 
new compounds by comparing the gene-expression profiles they elicit with those 
produced by known inducers. Figure 6 shows a flow diagram of the method used to 
isolate, verify and clone differentially expressed genes, and figure 7 shows expression 
profiles obtained from a typical SSH experiment. Subsequent sub-cloning of the 
individual bands, sequencing and gene data base interrogation reveals many genes 
which are either up- or down-regulated by phenobarbital in the rat (tables 2 and 3). 

One of the advantages in using the SSH approach is that no prior knowledge is 
required of which specific genes are up/down-regulated subsequent to xenobiotic 
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^i^JS^iSS"^^ f ° m m liV " f0l i 0Win * 3 - da ^ ««»«« with WY-14,643 or 

exposure, and an almost complete complement of genes are obtained. For example 
the peroxisome prol.fer.tor and non-genotoxk hepatocarcinogen Wy,14 643 up-' 
regulates at least 28 genes and down-regulates at least 15 in the rat a sensitive 
speces) and produces 48 up- and 37 down-regulated genes in the guLa p g 
resent speces (Rockett, Swales, Esda and Gibson, unpublished observaZ's) 
One of these genes, CD81, was up-regulated in the rat and down-regulated in the 
guinea pig following Wy-14,643 treatment. CD81 (alternatively named TAPA 1 i 
a widely expressed cell surface protein which is involved in a large number of cell u a 

Tim) Since 7 ^"T "™' pr0,ifer ' tion ^ ^-entiat.on Levy 
al 1998 . Since all of these functions are altered to some extent in the phenomena 

of hepatomegaly and non-genotoxic hepatocaranogenesis, it is intriguing and 

probably mechanistically-relevant, that CD81 expression i, differentially regulated 

» a -stant and susceptible speces. However, the down-side of th 1S approach is 

hat he majority of genes can be sequenced and matched to database sequences bu 

the latter are predominantly expressed sequence tags or genes of completely 

unknown function, thus partially obscunng a realistic overall assessmem of the 

critical genes of genuine biological interest. Notwithstanding the lack of complete 

funtional identification of altered gene expression, such gene profiling sTud 

essenaally provides a 'molecular fingerprint' ,n response to xenobiotic chal enge 

Z£i£? 38 ' m6ChaniSt,Ca lly ' re,eVa - «**>™ ^ detaifed* 
Differential Display (DD) 

and^rTe' Vrnt' 11 " h*?* fh *"P ri »ti», by arbitrarily primed PCR ■ (Liang 
and Pardee 1992) th,s method , S now more commonly referred to as 'differential 
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Table 2. 


Genes up-regulated in rat liver following 3-day exposure to phenobarbital. 


Band number 






(approximate 


Highest sequence 




size in bp) 


similarity 


FASTA-EMBL oene irl*nti£r*«t'ift*i 


5 (1300) 


93.5% 


CYP2B1 


7 (1000) 


95.1% 


Preproalbumin 


8 (9S0) 




Serum albumin mRNA 


98.3% 


NCI-CGAP-Prl H satum* CF<$T\ 


10(850) 


95.7% 


CYP2B1 


11 (800) 


Clone 1 94.9% 


CYP2B1 




Clone 2 75.3% 


CYP2B2 


12 (750) 


93.8% 


TRPM-2 mRNA 


15 (600) 




Sulfated glycoprotein 


92.9% 


Preproalbumin 


16(55) 




Serum albumin mRNA 


Clone 1 95.2% 


CYP2B1 


21 (350) 


Clone 2 93.6% 


Haptoglobulin mRNA partial alpha 


99.3% 


18S, 5.8S&28S rRNa 



Bands 1-4, 6, 9, 13, 14, and 17-20 are shown to be false positives by dot blot anaylsis and therefore 
are not sequenced. Derived from Rockett et al. (1997). It should be noted that the above genes do not 
represent the complete spectrum of genes which are up-regulated in rat liver by phenobarbital but 
simply represents the genes sequenced and identified to date. 



Table 3. Genes down-regulated in rat liver following 3-day exposure to phenobarbital. 



Band number 
(approximate 
size in bp) 



Highest sequence 
similarity 



FASTA-EMBL gene identification 



1 (1500) 




2 (1200) 




3 (1000) 




7 (700) 


Clone 1 




Clone 2 




Clone 3 


8 (650) 


Clone 1 




Clone 2 


9 (600) 


Clone 1 




Clone 2 


10 (550) 




11 (525) 




12 (375) 




13 (23) 


Clone 1 



14 (170) 

15 (140) 
Others: (300) 

(275) 



Clone 
Clone 



95.3% 
92.3% 
91.7% 
77.2% 
94.5% 
91.0% 
86.9% 
96.2% 
86.9% 
82.0% 
73.8% 
95.7% 
100.0% 
97.2% 
100.0% 
100.0% 
96.0% 
97.3% 
96.7% 
93.1% 



3-oxoacyl-CoA thiolase 
Hemopoxin mRNA 
Alpha-2u-globulin mRNA 
M.musculus CI inhibitor 
Electron transfer flavoprotein 
M. musculus Topoisomerase 1 (Topo 1) 
Soares 2NbMT M. musculus (EST) 
Alpha-2u-globulin (s-type) mRNA 
Soares mouse NML M. musculus (EST) 
Soares p3NMF 19.5 M. musculus (EST) 
Soares mouse NML M, musculus (EST) 
NCI-CGAP-Prl H. sapiens (EST) 
Ribosomal protein 

Soares mouse embryo NbME135 (EST) 

Fibrinogen B-beta-chain 

Apolipoprotein E gene 

Soares p3NMF19.5 M. musculus (EST) 

Stratagene mouse testis (EST) 

R. norvegicus RASP 1 mRNA 

Soares mouse mammary gland (EST) 



EST = Expressed sequence tag. Bands 4-6 were shown to be false positives by dot blot analysis and 
therefore, were not sequenced. Derived from Rockett e/ al. (1997). Itshould be noted that the above genes 
do not represent the complete spectrum of genes which are down-regulated in rat liver by phenobarbital 
but simiply represents the genes sequenced and identified to date. 



display' (DD). In this method, all the mRNA species in the control and treated cell 
populations are amplified in separate reactions using reverse transcriptase-PCR 
(RT-PCR). The products are then run side-by-side on sequencing gels. Those 
bands which are present in one display only, or which are much more intense in one 
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display compared to the other, are differentially expressed and may be recovered for 
--further characterization-; One advantage of this system is the speed_with which it can 

clones" ° t0 ° btain 3 d ' SPlay 38 HttIC 8S 8 WCek t0 make nd identif y 

Two commonly used variations are based on different methods of primine the 
reverse transcription step (figure 8). One is to use an oligo dT with a 2-base 'anchor' 
at the 3'-end, e.g. 5' (dT n )CA 3' (Liang and Pardee 1992). Alternative* an 
arbitrary primer may be used for 1st strand cDNA synthesis (Welsh et al. 1992). 

SlTprR O , gerP " n /T g 3lS0 bCen C3lled ' RAP ' < RNA Arbitrarily 
Primed)-PCR. One advantage of this second approach is that PCR products may be 

derived from anywhere in the RNA, including open reading frames. In addition i 
can be used for mRN As that are not polyadenylated, such as many bacterial mRNAs 
(Wong and McClel and 1994). In both cases, following reverse" trans " P tTon and 
denaturation, second strand cDNA synthesis is carried out with an arbitrary primer 
(arburary pnmers have a single base at each position, as compared to LI 
primers, which contain a mixture of all four bases at each position). The resulting 
PCR thus produces a series of products which, depending on the system (primer 
length and composition, polymerase and gel system), usually includes 50-100 
products per primer set (Band and Sager 1989). When a combination of different 
dT-anchors and I arbitrary primers are used, almost all mRNA species from a cell can 
be amplified. When the cDNA products from two different populations are analyS 
side by side on a polyacrylamide gel, differences in expression can be identified and 
the appropriate bands recovered for cloning and further analysis. 

Although DD ,s perhaps the most popular approach used today for identifying 
differentially expressed genes, it does suffer from several perceived disadvantages : 

0) JqqT IT' \"u° n ? b f tOW3rdS high C ° Py number mRNAs (Bertioli et al. 
1 995) although this has been disputed (Wan et al. 1 996) and the isolation of very 
^abundance genes may be achieved in certain circumstances (Guimeraes et 

(2) The cDNAs obtained often only represent the extreme 3' end of the mRNA 

often the 3 -untranslated region), although this may not always be the case 
(Guimeraes et al 1995a). Since the 3 'end is often not included in Genbank and 
shows variation between organisms, cDNAs identified by DD cannotalways be 
matched with their genes, even if they have been identified 

(3) The pattern of differential expression seen on the display often cannot be 
reproduced on Northern blots, with false positive, arismg in up to 70% of cases 
(Sun al. 1994). Some adaptations have been shown to reduce false positives 
including the use of two reverse transcnptases (Sung and Denman 1997)' 
comparison of uninduced and induced cells over a time course (Burn et al 1994) 
and comparison of DDPCR-products from two uninduced and two induced 
lines Sompayrac et al. 1995). The latter authors also reported that the use of 
cytoplasmic RNA rather then total RNA reduces false positives arising from 
nuclear RNA that is not transported to the cytoplasm. 

Further details of the background, strengths and weaknesses of the DD 
technique can be obtained from a review by McClelland et al. (1996) and from 
articles by Liang et al. (1995) and Wan et al. (1996). 
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mRNA 



(dTn)CA: AC 




-AAAAAAAA 

Arbitrary primer: 



1«« strand cDNA 
4 AC 



1 " strand cDNA 
< 



-UGAAAAAAA 



-AAAAAAA 



Denature and synthesise 2 nd strand 
with any arbitrary primer ( ) 



2 nd strand cDNA 



-AC 



2 nd strand cDNA 
► 



cDNA can now be amplified by PCR using original primer pair 

Figure 8 Two approaches to dtferential display (DD) analysis. 1- strand synthesis can be carried out 
either with a polydT,, NN primer (where N = G , C or A) or with an arbitrary primer. The use of 
different combinations of G C and A to anchor the first strand polydT primer enables the priming 
of the majority of polyadenylated mRNAs. Arbitrary primers may hybridize at none, one or more 
places along the length of the mRNA, allowing 1" strand cDNA synthesis to occur at none one 
or more points m the same gene. In both cases, 2 nd strand synthesis is carried out with an arbitrary 
primer. Since these arbitrary primers for the 2 nd strand may also hybridize to the 1 * strand cDNA 
in a number of different places, several different 2 nd strand products mav be obtained from one 
binding point of the 1" strand primer. Following 2" d strand synthesis, the original set of primers 
is used to amplify the second strand products, with the result that numerous gene sequences are 



Restriction endonuclease-facilitated analysis of gene expression 
Serial Analysis of Gene Expression (SAGE) 

A more recent development in the field of differential display is SAGE analysis 
(Velculescu et al. 1995). This method uses a different approach to those discussed so 
far and is based on two principles. Firstly, in more than 95% of cases, short 
nucleotide sequences ('tags') of only nine or 10 base pairs provide sufficient 
information to identify their gene of origin. Secondly, concatenation (linking 
together in a series) of these tags allows sequencing of multiple cDNAs within a 
single clone. Figure 9 shows a schematic representation of the SAGE process. In this 
procedure, double stranded cDNA from the test cells is synthesized with a 
biotinylated polydT primer. Following digestion with a commonly cutting (4bp 
recognition sequence) restriction enzyme ('anchoring enzyme*), the 3' ends of the 
cDNA population are captured with streptavidin beads. The captured population is 
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sphtinto two and different adaptors ligated to the 5 'ends of each group. Incorporated 
-into the adaptors is a recognition sequence for a type'lIS restriction enzyme-one 
which cuts DNA at a denned distance (< 20 bp) from its recognition sequence 
Hence, following digestion of each captured cDNA population with the IIS enzyme 
the adaptors plus a short piece of the captured cDNA are released The two 
populations are then ligated and the products amplified. The amplified products are 
cleaved with the original anchoring enzyme, religated (concatomers are formed in 
the process) and cloned. The advantage of this system is that hundreds of gene tags 
can be identified by sequencing only a few clones. Furthermore, the number of times 
a given transcript is identified is a quantitative measurement of that gene's 
abundance m the original population, a feature which facilitates identification of 
differentially expressed genes in different cell populations. 

Some disadvantages of SAGE analysis include the technical difficulty of the 
method, a large amount of accurate sequencing is required, biased towards abundant 
mRNAs, has not been validated in the pharmaco/toxicogenomic setting and has 
only been used to examine well known tissue differences to date. 



Gene Expression Fingerprinting (GEF) 

A different capture/restriction digest approach for isolating differentially 
expressed genes has been described by Ivanova and Belyavsky (1995) In this 
method, RNA is converted to cDNA using biotinylated oligo(dT) primers The 
cDNA population is then digested with a specific endonuclease and captured with 
magnetic streptavidm microbeads to facilitate removal of the unwanted 5 'digestion 
P ™ U A Ct /' USC ° f restricted 3 '" ends al °n* serves to reduce the complexity of the 
cDNA fragment pool and helps to ensure that each RNA species is represented by 
not more than one restriction product. An adaptor is ligated to facilitate subsequent 
amplification of the captured population. PCR is carried out with one adaptor- 
specific and one biotinylated polydT primer. The reamplified population is 
recaptured and the non-biotinylated strands removed by alkaline dissociation The 
non-biotinylated strand is then resynthesized using a different adaptor-specific 
primer m the presence of a radiolabeled dNTP. The labelled immobilized 3'cDNA 
ends are next sequentially treated with a series of different restriction endonucleases 
and the products from each digestion analysed by PAGE. The result is a fingerprint 
composed of a number of ladders (equal to the number of sequential digests used) 
By comparing test versus control fingerprints, it is possible to identify differentially 
expressed products which can then be isolated from the gel and cloned The 
advantages of this procedure are that it is very robust and reproducible and the 
authors estimate that 80-93% of cDNA molecules are mvolved in the final 
fingerprint. The disadvantage is that polyacrylamide gels can rarely resolve more 
than 300-400 bands, which compares poorly to the 1000 or more which are 
estimated to be produced in an average experiment. The use of 2-D gels such as 
those described by Uitterlinden et al. (1989) and Hatada et al. (1991) may help to 
overcome this problem. 

A similar method for displaying restriction endonuclease fragments was later 
described by Prashar and Weissman (1996). However, instead of sequential 
digestion of the immobolized 3'-terminal cDNA fragments, these authors simply 
compared the profiles of the control and treated populations without further 
manipulation. 
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■AAAA 



"AAAA 

TTTT 



1* strand cDNA synthesis using 
biotinylated-polydT primers 



cDNA cleaved with AE and 
^ captured with streptavivin beads 



GTAC " 



GTAC 



Divide in half and ligate linkers 




CATC 



„.._ AAAA r\ V7p\ CATG ...... 

GTAC tttt,-/ \ GTAC ZZZ~T **** 




CATG^ A AAA 

GTAC TTTT^ 



1 Cleave with tagging enzyme (TE) 1 
and produce blunt ends 



GGATGCATGXXXXXXXXX 
CCTACGTACXXXXXXXXX 



GGATGCATGOOOOOOOOO 
CCTACGTACOOO OOOOOO 



TE AE Tag Ti AE Tag 



| Ligate and amplify 



GGATGCATGXXXXXXXXXCXXX)OOOOOCATGCATCC 
CCTACGTACXXXXXXXXXOOOOOOOOOGTACGTAGG 



DiTag 

AE AE 



1 



Cleave with AE, isolate .diTags, 
concatenate, clone and 
sequence 

AE 



Z5?ISS^^^ TG »«XXXXXXO00O0OOOOCATG- 
— GTACXXXXXXXXXOOOOOOOOQGTAC XXXXXXXXXOOOOOOOOOGTAC— 

Tag1 Tag2 Tag3 Tag 4 

F ^(AES3S^^ e T^° n (SAGE)anal >' sls - cD »* deleaved w.th an anchonng en Zyme 



674 



J. C. Rockett et al. 



DNA arrays 

— -'Open * differentia* display systems are cumbersome in that it takes a great deal 
of tune to extract and identify candidate genes and then confirm thTt they are indeed 
up- or down-regulated in the treated compared to the control tissue. Normally the 
atter process ,s earned out using Northern blotting or RT-PCR. Even so, each of 
the aforement,oned steps produce a bottleneck to the ultimate goal of rapid analysis 

86 nT™r n - Th r C Pr ° blemS WU1 likeIy be addressed ^ ^e development of 
so-called DNA arrays (e.g. Gress et al. 1 992, Zhao et al. 1 995 , Schena et al 1 996) 
the mtroducnon of which has signalled the next era in differential gene expr ssion 
analysis^ DNA arrays consist of a gridded membrane or glass 'chips' SSLE 
hundreds or thousands of DNA spots, each consisting of multiple copies "part of 
a known gene. The genes are often selected based on previously proven mvlement 
m oncogenesis ceU cycling, DNA repair, development and other cellular processes 
They are usually chosen to be as specific as possible for each gene and animal spec L 
Human and mouse arrays are already commercially available and a few compan's 
mil construct a personalized array to order, for example Clontech Laboratories and 
Research Genetics Inc. The technique is rapid in that hundreds or even iousands 
of genes can be spotted on a single array, and that mRNA/cDNA from the tes 
populates can be labelled and used directly as probe. When analysed with 
appropriate hardware and software, arrays offer a rapid and quantitative means o 
assess dtferences » gene expression between two cell population.. Of courTthe e 
can only be identmcanon and quantitation of those genes which are in the array 
(hence the term closed' S y Stem ). Therefore, one approach to elucTdating he 
mo ecular mechamsms involved in a particular disease/development system may be 
to combine an open and closed system-a DNA array to directly idemiTy and 

^ZZ^m^f^T gCneS - mRNA Populations^nr:n y o;en 
system such as SSH to isolate unknown genes which are differentially expressed 

One of the mam advantages of DNA arrays is the huge number of gene fragments 
which can be put on a membrane-some compares have reported gridd „g"p to 
60000 spots on a single glass 'chip' (microscope slide). These high density chin! 
based micro-arrays will probably become available as mass-produced off-^e-shelf 
items m the near future. This should facilitate the more rapid detenninat'on of 
differential expression in time and dose-response experiment.. Aside from their 
high cost and the technical complexities involved in producing and prob kg DNA 
arrays, the main problem which remains, especally with the f newer ^cro-array 
(gene-chip) technologies, is that results are often not wholly reproducible between 

lZ Y ul ylT' S Pr ° bIem " bemg 3ddreSSed Sh0U,d bC - 0lved ^ " " 



EST databases as a means to identify differentially expressed genes 

c DNA Pr , C K S qU tr enCe T (ESTS) 3re Panial Se ^ uences of cl ™« Stained from 
cDNA hbranes. Even though most ESTs have no formal identity (putaTe 
Verification u the best to be hoped for), they have proven to be a rapid and effi ^ n 
means of discovering new genes and can be used to generate profiles of gene 
express.on m speafic cells. Since they were first described by Adams et al (1991) 
there has been a huge explosion in EST production and it is estimated that there are 
now well over a milhon such sequences in the public domain, representing ov ha 
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.^J 11 hum ™ «««• (HJlier et a/. 1996), This lar«e number of freely available 
sequences (both sequence information and clones are normally avswkble royalty-free 
from the ongmators) has enabled the development of a new approach towards 
differential gene expression analysis as described by Vasmatzis et al (1998) The 
approach is simple in theory: EST databases are first searched for genes that have a 
number of related EST sequences from the target tissue of choice, but none or few 
from non-target tissue libraries. Programmes to assist in the assembly of such sets of 
overlapping data may be developed in-house or obtained privately or from the 
internet. For example, the Institute for Genomic Research (TIGR, found at 
http://Www.tigr.org) provides many software tools free of charge to the scientific 
community. Included amongst these is the TIGR assembler (Sutton et al 1995) a 
tool for the assembly of large sets of overlapping data such as ESTs, bacterial 
artificial chromosomes (BAC)s, or small genomes. Candidate EST clones repre- 
senting different genes are then analysed using RNA blot methods for size and tissue 

S n^ C 7 "J ' < T T' 11 ^' USCd 38 Pr ° bes t0 isolate and identif y ^e full length 
cDNA clone for further characterization. In practice however, the method is rather 
more involved, requiring bioinformatic and computer analysis coupled with 
confirmatory molecular studies. Vasmatzis et al. (1998) have described several 
problems in this fledgling approach, such as separating highly homologous 
sequences denvec 1 from different genes and an overemphasis of specificity for some 
EST sequences. However, since these problems will largely be addressed by the 

trt/^Tr l \T e SU,tabl i e i c ° mpUter al e°"thms « increased completeness 
of the EST database, it ,s likely that this approach to identifying differentially 
expressed genes may enjoy more patronage in the future. 



Problems and potential of differential expression techniques 
The holistic or single cell approach ? 

When working with in vivo models of differential expression, one of the first 
issues to consider must be the presence of multiple cell types in any given specimen. 
For example a hver sample is likely, to contain not only hepatocytes but also 
potentially) Ito cells, bile ductule cells, endothelial cells, various immune cells (e g 
lymphocytes, macrophages and Kupffer cells) and fibroblasts. Other tissues will 
each have their own distinctive cell populations. Also, in the case of neoplastic tissue 
there are almost always normal, hyperplastic and/or dysplastic cells present in a 
sample. One must, therefore, be aware that genes obtamed from a differential 
display experiment performed on an animal tissue model may not necessarily arise 
exclusively from the intended 'target' cells, e.g. hepatocytes/neoplastic cells If 
appropriate, further analyses using immunohistochemistry, in situ hybridizarion or 
m sztu RT-PCR should be used to confirm which cell types are expressing the 
gene(s) of interest. This problem is probably most acute for those studying the 
differential expression of genes in the development of different cell types where 
there is a need to examine homologous cell populations. The problem is now being 
addressed at the National Cancer Institute (Bethesda, MD, USA) where new micro- 
disection techniques have been employed to assist in their gene analysis programme 
the Cancer Genome Anatomy Project (CGAP) (For more information see web site •' 
http Vywww.ncbi .nlm.nih.gov/ncicgap/intro.html). There are also separation tech- 
niques available that utilise cell-specific antigens as a means to isolate target cells 
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?i JSTT" aCtiVa ? d ? " S ° ning (FACS) (Dunbar « fl/ - 199 *. Kas-Deelen « 
*UJ98) and-magnetic-bead technology (Richard el dl 1998, Rogler et al 1 99S) 

However, those taking a holistic approach may consider this & ue unimportant 

There ,s an equally appropriate view that all those genes showing altered express'^ 

within a compromized tissue should be taken into consideration After alts n aS 

tissues are complex mixes of different, interacting cell types which intimated 

regulate each other's growth and development, it is clear that each cell typTco" 

T VTkT^ (P ° S,t,Vdy 01 negatiV6ly) tOWards the molecular mechan sms 
which he behind responses to external stimuli or neoplastic growth. It isTertap 
then more informative to carry out differential display experiments using ^ a 
opposed to m mtro models, where uniform populations of identical c2 probTb y 
represent a partial, skewed or even inaccurate p 1C ture of the molecular changes tl 

.h^IlV 11 '^^ 6 P0SSibk im P lications of inter-individual biological variation 
should be considered in any approach where whole animal models are being used It 
is clear that mdmduals (humans and animals) respond in different w.^id«Scil 
stimuli. One of the best characterized examples is the debrisoquL oxidation 
polymorphism, which is mediated by cytochrome CYP2D6 and determines he 

za n rr997)Th: y c ; mmo ? y prescribed drugs (Le - ard Mey , 

Zanger 1997). The reasons for such differences are varied and complex but allelic 
varia ions, regulatory region polymorphisms and even physical and mental health 

;:; c : tr,me k° observed differences in individual c™M th ou£ t 

should there ore, be given to the specific objective, of the study and to the posslle 
value of pooling startmg material (tissue/mRNA). The effect of this can b 
beneficial through the ironing out of exaggerated responses and unimpo*LTh£ 
fluctuations of (mechanistically) .relevant genes i» mdl vidual anima" thu 
providing a clearer overall picture of the general molecular mechanism of 
response. However at the same time such minor variation, may beTf utm 0S 
importance in deeding the ability of individual animals to succumb to or resTsTthe 
effects of a given chemical/disease. 



Z^^r ression techniques at r ~ g a hi * h *~ of 

A number of groups have produced experimental data suggesting that mam- 
mahan cells produce between 8000-1 5000 different mRNA specL at Ly one t me 

if B r a ^30000 1 81 ' HedHck " aL ^avo 1990), ahhoug/figur a 

high as 20-30000 have also been quoted (Axel et al. 1976). Hedrick et al (1984 

Tar A h WH nCe SU8 f 8 t Stm r hat thC maj ° my ° f these bel -g to the rare abundance 
class A breakdown of this abundance distribution 1S shown in table 1 

da ZlT- A reSUltS °/ differential dis ^y experiments have been compared with 
data obtained previously using other methods, it ,s apparent that not all differential y 
expressed mRNA. are represented in the final display. In particular, rare mess g« 
(which importantly, often include regulatory proteins) are not easily recovered 
using differential display systems. This is a major shortcoming, as the majoX of 
mRNA spec.es exist at levels of less than 0.005 % 0 f the total population^ tab ? 1) 
Bernol, et al (199 examined the efficiency of DD templates (heteregWeou. 
mRNA populations) for recovering rare messages and were unable to detect mRNA 
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species present at less than 1 .2 % of the total mRNA population-equivalent to an 
-intermediateOT abundant-species. Interestingly, when simple model_systems (single 
target only) were used instead of a heterogeneous mRNA population, the same 
primers could detect levels of target mRNA down to 10000X smaller. These results 
are probably best explained by competition for substrates from the many PCR 
products produced in a DD reaction. 

The numbers of differentially expressed mRNAs reported in the literature using 
various model systems provides further evidence that many differentially expressed 
mRNAs are not recovered. For example, DeRisi et al. (1997) used DNA array 
technology to examine gene expression in yeast following exhaustion of sugar in the 
medium, and found that more than 1700 genes showed a change in expression of at 
least 2-fold. In light of such a finding, it would not be unreasonable to suggest that 
of the 8000-15 000 different mRNA species produced by any given mammalian cell, 
up to 1000 or more may show altered expression following chemical stimulation. 
Whilst this may be an extreme figure, it is known that at least 100 genes are 
activated/upregulated in Jurkat (T-) cells following IL-2 stimulation (Ullman et al 
1990). In addition, Wan et al. (1996) estimated that interferon- y-stimulated HeLa 
cells differentially express up to 433 genes (assuming 24000 distinct mRNAs 
expressed by the cells). However, there have been few publications documenting 
anywhere near the recovery of these numbers. For example, in using DD to compare 
normal and regenerating mouse liver, Bauer et al. (1993) found only 70 of 38000 
total bands to be different. Of these, 50% (35 genes) were shown to correspond to 
differentially expressed bands. Chen et al. (1996) reported 10 genes upregulated in 
female rat liver following ethinyl estradiol treatment. McKenzie and Drake (1997) 
identified 14 different gene products whose expression was altered by phorbol 
mynstate acetate (PMA, a tumour promoter agent) stimulation of a human 
myelomonocytic cell line. Kilty and Vickers (1997) identified 10 different gene 
products whose expression was upregulated in the peripheral blood leukocytes of 
allergic disease sufferers. Linskens et al. (1995) found 23 genes differentially 
expressed between young and senescent fibroblasts. Techniques other than DD 
have also provided an apparent paucity of differentially expressed genes Using SH 
for example, Cao et al. (1997) found 15 genes differentially expressed in colorectal 
cancer compared to normal mucosal epithelium. Fitzpatrick et al. (1995) isolated 17 
genes upregulated in rat liver following treatment with the peroxisome proliferator 
clonbrate; Philips et al. (1990) isolated 12 cDNA clones which were upregulated in 
highly metastatic mammary adenocarcinoma cell lines compared to poorly meta- 
static ones. Prashar and Weissman (1996) used 3' restriction fragment analvsis and 
identified approximately 40 genes showing altered expression within 4 h of 
activation of Jurkat T-cells. Groenink and Leegwater (1996) analvsed 27 gene 
fragments isolated using SSH of delayed early response phase of liver regeneration 
and found only 12 to be upregulated. 

In the laboratory, SSH was used to isolate up to 70 candidate genes which appear 
to show altered expression in guinea pig liver following short-term treatment with 
the peroxisome proliferator, WY-14,643 (Rockett, Swales, Esdaile and Gibson 
unpublished observations). However, these findings have still to be confirmed by 
analysis of the extracted tissue mRNA for differentia] expression of these sequences. 

Whilst the latest differential display technologies are purported to include design 
and experimental modifications to overcome this lack of efficiency (in both the total 
number of differentially expressed genes recovered and the percentage that are true 



678* 



positives), it is still not clear if such adaptations are practically effective— pr ving 
-—efficiency by -spiking "With a~ known amount of limited numbers of artificial 
construct(s) is one thing, but isolating a high percentage of the raremessages already 
present in an mRNA population is another. Of course, some models will genuinely 
produce only a small number of differentially expressed genes. In addition, there are 
also technical problems that can reduce efficiency. For example, mRNAs may have 
an unusual primary structure that effectively prevents their amplification by PCR- 
based systems. In addition, it is known that under certain circumstances not all 
mRNAs have 3'polyA sites. For example, during Xenopus development, deadenyl- 
ation is used as a means to stabilize RNAs (Voeltz and Steitz 1998), whilst 
preferential deadenylation may play a role in regulating Hsp70 (and perhaps, 
therefore, other stress protein) expression in Drosophila (Dellavalle et al. 1994). The 
presence of deadenylated mRNAs would clearly reduce the efficiency of systems 
utilizing a polydT reverse transcription step. The efficiency of any system also 
depends on the quality of the starting material. All differential display techniques 
use mRNA as their target material. However, it is difficult to isolate mRNA that is 
completely free of ribosomal RNA. Even if polydT primers are used to prime first 
strand cDNA synthesis, ribosomal RNA is often transcribed to some degree 
(Clontech PCR-Select cDNA Subtraction kit user manual). It has been shown at 
least in the case of SSH, that a high rRNA :mRNA ratio can lead to inefficient 
subtractive hybridization (Clontech PCR-Select cDNA Subtraction kit user 
manual), and there is no reason to suppose that it will not do likewise in other SH 
approaches. Finally, those techniques that utilise a presubtraction amplification step 
(e.g. RDA) may present a skewed representation since some sequences amplify 
better than others. 

Of course, probably the most important consideration is the temporal factor. It 
is clear that any given differential display experiment can only interrogate a cell at 
one point in time. It may well be that a high percentage of the genes showing altered 
expression at that time are obtained. However, given that disease processes and 
responses to environmental stimuli involve dynamic cascades of signalling, 
regulation, production and action, it is clear that all those genes which are switched 
on/off at different times will not be recovered and, therefore, vital information may 
well be missed. It is, therefore, imperative to obtain as much information about the 
model system beforehand as possible, from which a strategy can be derived for 
targeting specific time points or events that are of particular interest to the 
investigator. One way of getting round this problem of single time point analysis is 
to conduct the experiment over a suitable time course which, of course, adds 
substantially to the amount of work involved. 



How sensitive are deferential expression technologies ? 

There has been little published data that addresses the issue of how large the 
change in expression must be for it to permit isolation of the gene in question with 
the various differential expression technologies. Although the isolation of genes 
whose expression is changed as little as 1.5-fold has been reported using SSH 
(Groenink and Leegwater 1996), it appears that those demonstrating a change in 
excess of 5-fold are more likely to be picked up. Thus, there is a 'grey zone' 
in between where small changes could fade in and out of isolation between 
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experiments and animals. DD, on the other hand, is not subject to this grey 
zone since, unlike SH approaches, it does not amplify the difference in expression 
between two samples. Wan et al. (1996) reported that differences in expression of 
twofold or more are detectable using DD. 



Resolution and visualization of differential expression products 

It seems highly improbable with current technology that a gel system could be 
developed that is able to resolve all gene species showing altered expression in any 

SyStCn l (bC h SH " 01 DD - based )- Polyacrylamide gel electrophoresis 
(PAGE) can resolve size differences down to 0.2% (Sambrook et al. 1989) and are 
used as standard in DD experiments. Even so, it is clear that a complex series of gene 
products such as those seen in a DD will contain unresolvable components. Thus 
what appears to be one band in a gel may in fact turn out to be several Indeed it has 
been well documented (Mathieu-Daude et al. 1996, Smith et al. 1997) that a'single 
band extracted from a DD often represents a composite of heterogeneous products 
and the same has been found for SSH displays in this laboratory (Rockett et al 
1997). One possible solution was offered by Mathieu-Daude et al (1996) who 
extracted and reamplified candidate bands from a DD display and used single strand 
conformation polymorphism (SSCP) analysis to confirm which components 
represented the truly differentially expressed product. 

Many scientists often try to avoid the use of PAGE where possible because it is 
technically more demanding than agarose gel electrophoresis (AGE). Unfortunately 
high resolution agarose gels such as Metaphor (FMC, Lichfield, UK) and AquaPor 
HR (National Diagnostics, Hessle, UK), whilst easier to prepare and manipulate 

Ic,! ° nly SCparate DNA se <? uences w hich differ in size by around 

1.5-2 /o (15-20 base pairs for a 1Kb fragment). Thus, SSH, RDA or other such 
products which differ in size by less than this amount are normally not resolvable 
However, a simple technique does in fact exist for increasing the resolving power of 
AGE— the inclusion of HA-red (10-phenyl neutral red-PEG ligand) or HA-yellow 
(bisbenzamide-PEG ligand) (Hanse Analytik GmbH, Bremen, Germany) in a 
gel separates identical or closely sized products on base content. Specifically 
HA-red and -yellow selectively bind to GC and AT DNA motifs, respectively 
OVawer et al. 1995, Hanse Analytik 1997, personal communication). Since both 
HA-stains possess an overall positive charge, they migrate towards the cathode 
when an electric field is applied. This is in direct opposition to DNA which 
is negatively charged and, therefore, migrates towards the anode. Thus' if two 
DNA clones are identical in size (as perceived on a standard high resolution 
agarose gel), but differ in AT/GC content, inclusion of a HA-dye in the gel 
will effectively retard the migration of one of the sequences compared to the 
other, effectively making it apparently larger and, thus, providing a means of 
differentiating between the two. The use of HA-red has been shown to resolve 
sequences with an AT variation of less than 1 % (Wawer et al. 1995), whilst Hanse 
Analytik have reported that HA staining is so sensitive that in one case it was used 
to distinguish two 567bp sequences which differed by only a single point mutation 
(Hanse Analytik 1 996, personal communication). Therefore, if one wishes to check 
whether all the clones produced from a specific band in a differential display 
experiment are derived from the same gene species, a small amount of reamplified 
or digested clone can be run on a standard high resolution gel, and a second aliquot 
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Figure 10. Discrimination of clones of identical/nearly identical size usinc HA r,H R a ca 

size (1-5) were extracted fmm rK» i j- , calsae us >ng HA-red. Bands of decreasing 

d« clones from t.ch bind ap» t », "o b"th "ZSll U /™' HA - red ' Y" h fcw «U 



in a sirmlar gel contaming one of the HA-stains. The standard gel should indicate 
any gross size differences, whilst the HA-stamed gel should separate otherw se 

TaT°n^ SPCC,eS i° n Standard AGE) aCC ° rding t0 thdr base c °™^ Geisinge 
et al. (1997) reported successful use of this approach for identifymg DD-derived 

clones. Figure 10 shows such an experiment earned out m thi. laboratory on clones 
obtained from a band extracted from an SSH display. 

An alternative approach is to carry out a 2-D analysis of the differential display 
products, n this approach, size-based separauon is first carried out in a standa d 

I? 1 H? ' , / 8 , CC C °T ninS the dis P la y is then ext "«ed and incorporated 
m to a HA gel for resolution based on AT/G C content 

Of course one should always consider the possibility of there being different 
gene species which are the same size and have the same GC/AT content However 
even these species are not unresolvable given some effort—gain, one might use 

7 ' °7 5 , 3 de ; atUnng gradi£nt g£l e]e «rophor e sis (DG GE) or temperature 
gradient field electrophoresis (TGGE) approach to resolve the contents of a band 

prod e uct ,re °" CXtraCted b3nd (SUZUki 61 aL 1991) ° r ° n the rea ^Plined 
The requirement of some differential display techniques to visualize large 
numbers of products (e.g. DD and GEF) can also present a problem in that, in terms 
of numbers, the resolution of PAGE rarely exceeds 300-400 bands. One approach to 

Z^J^Xfitf 8615 SUCH 35 th0SC dCSCribed ^ UltJlinden « 
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ExtractiojLof. differentially-expressed bands from a gel can be complex since, in 

some cases (e.g. DD, GEF), the results are visualized by autoradiographic means 
such that precise overlay of the developed film on the gel must occur if the correct 
band is to be extracted for further analysis. Clearly, a misjudged extraction can 
account for many man-hours lost. This problem, and that of the use of radioisotopes, 
has been addressed by several groups. For example, Lohmann et al. (1995) 
demonstrated that silver staining can be used directly to visualize DD bands in 
horizontal PAGs. An et al. (1996) avoided the use of radioisotopes by transferring a 
small amount (20-30%) of the DNA from their DD to a nylon membrane and 
visualizing the bands using chemiluminescent staining before going back to extract 
the remaining DNA from the gel. Chen and Peck (1996) went one step further and 
transferred the entire DD to a nylon membrane. The DNA bands were then 
visualized using a digoxigenin (DIG) system (DIG was attached to the polydT 
primers used in the differential display procedure). Differentially expressed bands 
were cut from the membrane and the DNA eluted by washing with PCR buffer prior 
to reamplification. 

One of the advantages of using techniques such as SSH and RDA is that the final 
display can be run on an agarose gel and the bands visualized with simple ethidium 
bromide staining. Whilst this approach can provide acceptable results, overstating 
with SYBR Green I or SYBR Gold nucleic acid stains (FMC) effectively enhances 
the intensity and sharpness of the bands. This greatly aids in their precise extraction 
and often reveals some faint products that may otherwise be overlooked Whilst 
differential displays stained with SYBR Green I are better visualized using short 
wavelength UV (254 nm) rather than medium wavelength (306 nm), the shorter 
wavelength is much more DNA damaging. In practice, it takes only a few seconds 
to damage DNA extracted under 254 nm irradiation, effectively preventing 
reamplification and cloning. The best approach is to overstain with SYBR Green I 
and extract bands under a medium wavelength UV transillumination. 



The possible use of 'microfingerprinting • to reduce complexity 

Given the sheer number of gene products and the possible complexity of each 
band, an alternative approach to rapid characterization may be to use an enhanced 
analysis of a small section of a differential display— a 'sub-fingerprint' or 'micro- 
fingerprint'. In this case, one could concentrate on those bands which only appear 
m a particular chosen size region. Reducing the fingerprint in this way has at least 
two advantages. One is that it should be possible to use different gel types, 
concentrations and run times tailored exactly to that region. Currently one might 
run products from 100-3000 + bp on the same gel, which leads to compromize in the 
gel system being used and consequently to suboptimal resolution, both in terms of 
size and numbers, and can lead to problems in the accurate excision of individual 
bands. Secondly, it may be possible to enhance resolution by using a 2-D analysis 
using a HA-stain, as described earlier. In summary, if a range of gene product sizes 
is carefully chosen to included certain * relevant ' genes, the 2-D system standardized, 
and appropriate gene analysis used, it may be possible to develop a method for the 
early and rapid identification of compounds which have similar or widely different 
cellular effects. If the prognosis for exposure to one or more other chemicals which 
display a similar profile is already known, then one could perhaps predict similar 
effects for any new compounds which show a similar micro-fingerprint. 



682 



J. C. Rockett et al. 



An alternative approach to microfingerprinting is to examine altered expression 
-urspecinc families of geries through careful selection of PCR primers and/ r post- 
reaction analysis. Stress genes, growth factors and/or their receptors, cell cycling 
genes, cytochromes P450 and regulatory proteins might be considered as candidates 
for analysis m this way. Indeed, some off-the-shelf DNA arrays (e.g. Clontech's 
Atlas cDNA Expression Array series) already anticipated this to some degree by 
grouping together genes involved in different responses e.g. apoptosis, stress, DNA- 
damage response etc. 



Screening 
False positives 



The generation of false positives has been discussed at length amongst the 
differential display community (Liang et al. 1 993 , 1 995 , Nishio et al 1 994 Sun et al 
1994, Sompayrac et al. 1995). The reason for false positives varies' with the 
technique being used. For instance, in RDA, the use of adaptors which have not 
been HPLC purified can lead to the production of false positives through illegitimate 
ligation events (O'Neill and Sinclair 1997), whilst in DD they can arise through 
artifacts and illegitemate transcription of rRNA. In SH, false positives appear 

^xf A d / n ^ i argdy fr ° m abundant & ene secies, although some may arise from 
cDNA/mRNA speeds which do not undergo hybridization for technical reasons 

A quick screening of putative differentially expressed clones can be carried out 
using a simple dot blot approach, in which labelled first strand probes synthesized 
from tester and driver mRNA are hybridized to an array of said clones (Hedrick et 
al. 1984, Sakaguchi et al. 1986). Differentially expressed clones will hybridize to 
tester probe, but not driver. The disadvantage of this approach is that rare species 
may not generate detectable hybridization signals. One option for those using SSH 
is to screen the clones using a labelled probe generated from the subtracted cDNA 
from which it was derived, and with a probe made from the reverse subtraction 
reaction (ClonTechniques 1997a). Since the SSH method enriches rare sequences 
it should be possible to confirm the presence of clones representing low abundance 
genes. Despite this quick screening step, there is still the need to go back to the 
original mRNA and confirm the altered expression usmg a more quantitative 
approach. Although this may be achieved using Northern blots, the sensitivity is 
poor by today's high standards and one must rely on PCR methods for accurate and 
sensitive determinations (see below). 



Sequence analysis 

The majority of differential display procedures produce final products which are 
between 100 and lOOObp in size. However, this mav considerably reduce the size of 
the sequence for analysis of the DNA databases. This in turn leads to a reduced 
confidence in the result— several families of genes have members whose DNA 
sequences are almost identical except in a few key stretches, e.g. the cytochrome 
P450 genesuperfamily (Nelson etal. 1996). Thus, does the clone identified as being 
almost identical to gene X 0 really come from that gene, or its brother gene X or its 
as yet undiscovered sister X 2 ? For example, using SSH , part of a gene was isolated 
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which was up-regulated in the liver of rats exposed to Wy-14,643 and was identified 
—by-a-FASTA search as-being transferrin (data not shown). However, transferrin is 
known to be downregulated by hypolipidemic peroxisome proliferates such as Wy- 
14,643 (Hertz et al 1996), and this was confirmed with subsequent RT-PCR 
analysis. This suggests that the gene sequence isolated may belong to a gene which 
is closely related to transferrin, but is regulated by a different mechanism. 

A further problem associated with SH technology is redundancy. In most cases 
before SH is carried out, the cDNA population must first be simplified bv restriction 
digestion. This is important for at least two reasons : 

(1) To reduce complexity-long cDNA fragments may form complex networks 
which prevent the formation of appropriate hybrids, especially at the high 
concentrations required for efficient hybridization. 

(2) Cutting the cDNAs into small fragments provides better representation of 
individual genes. This is because genes derived from related but distinct 
members of gene families often have similar coding sequences that may cross- 
hybndize and be eliminated during the subtraction procedure (Ko 1990) 
Furthermore, different fragments from the same cDNA may differ considerably 
in terms of hybridization and amplification and, thus, may not efficiently do one 
or the other (Wang and Brown 1991). Thus, some fragments from differentially 
expressed cDNAs may be eliminated during subtractive hybridization pro- 
cedures. However, other fragments may be enriched and isolated As a 
consequence of this, some genes will be cut one or more times, giving rise to two 
or more fragments of different sizes. If those same genes are differentially 
expressed, then two or more of the different size fragments may come through 
as separate bands on the final differential display, increasing the observed 
redundancy and increasing the number of redundant sequencing reactions. 
Sequence comparisons also throw up another important point— at what degree 

of sequence similarity does one accept a result. Is 90% identitiv between a gene 
derived from your model species and another acceptably close? Is 95% between 
your sequence and one from the same species also acceptable ? This problem is 
particularly relevant when the forward and reverse sequence comparisons give 
similar sequences with completely different gene species! An arbitrary decision 
seems to be to allocate genes that are definite (95% and above similarity) and then 
group those between 60 and 95% as being related or possible homologues 



Quantitative analysis 

At some point, one must give consideration to the quantitative analysis of the 
candidate genes, either as a means of confirming that thev are trulv differentially 
expressed, or in order to establish just what the differences are. Northern blot 
analysis is a popular approach as it is relatively easy and quick to perform However 
the major drawback with Northern blots is that they are often not sensitive enough 
to detect rare sequences. Since the majority of messages expressed in a cell are of low 
abundance (see table 1), this is a major problem. Consequently, RT-PCR may be the 
method of choice for confirming differential expression. Although the procedure is 
somewhat more complex than Northern analysis, requiring synthesis of primers and 
optimization of reaction conditions for each gene species, it is now possible to set up 
high throughput PCR systems using mulitchannel pipettes, 96 +-well plates and 
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-appropriate" thermal" "cycling technnlr» ov wiV 

desire, being more LJ a ^ZZ^J~^ m ' lyS " » m « 
money and time needed to devefeT. ™ " n m " rnaI st ™d"d, the 

e S p«cWl y whenone m igh,b«/ am °L t r Pem °\ m0lemk * ofe > «-*«. 
use of semi-au.ntta.ive -S^EViS™"? °' 8CTe ^ Th « 
must first of .11 choose .n intern. * , / ' at ' V " ) ' involved - °" 

compared to the controls. NuXlTe?e " c t 7 "1 Cha " 8e fa the Kst 
example interferon.g.mma (IFN° F e T* « T " ** P "'' *» 

glyceraldehyd.-3-phosph.te dehydrogenase (gIpdh T " * " 94) ' 

hydrofolate reduce, (DHFR, Mohler and Butier W,', " ° '" 4) ' 

m, Murphy a al. 1990), hypoxanthine nWl t ■ '' '"-"""oglobulin (0-2- 
al. 1998) and a number o Xr^cZ? 'J" ,bo '* 1 < HP *T, Foss „ 

s^ndard shouid not change irieveU ° Jr^ m<,Ue \ 997b) - IdMlly ' an 
suge in the ce„ cycle or through Z^Z?^rZ^ 7""™ * 
shown on numerous occasions that the levels of K , H . owev <^ « h '° °«n 
used by the research community do in ft^K •'"""keeprng genes currently 

different tissues (CIonTechntues 1WM t 8 ' U " d " «" ai > editions .nd in 
limin.ry experiments be earned ou, o , L'l 'oATT'' ' ht " fore ' that P"" 
their suitability for use in the model system h ° USekM P">8 8">« » establish 
Interpretation of quantitative <\*+* ~ , 

comparing the I 1 s t so,le„es r„,i fi ed bv d»ertl V rea ' ed CaUtiM - * 

gam msight into why two different !B .™ „ . • ex P resslo, > °»e can perhaps 

For example, rats and mice £££££1 » d*«m w„ to externa, stimuli, 
range of peroxisome proliferates whX s' , '\ » OT -«™>to»c effect, of a wide 
resent (Orton « 1M4 . R^^^^^^P**"^ 
Makowska et al. 1992) A simnlifi^ ""-bull 1987, Lake « al. 1989, 1993 

compare lists of up- % ° ™ oWn « wh V is to 

expressed in only one ^ci^r^Jft"? ^'I? those are 

thesaid g ene,mi g ht Sug ^stamecnaS 

or protection. Of course, the situation is likely o b f f ^ enoto »««rcino8enc.i. 
there were one key gene protecting guinea 2 from no PerhapS if 

upregulated 50 times by PP S the sL a, ? no "-genoto Xlc effects and it was 

in the rat. However, since both we noted" ^ ^e times 

gene may be overlooked. Just Zcl^St? u P re ^^t e d, the amportance of the 

true relevance of gene Y which ZTlsoZT ^ F ° r e3H,mple ' what is th * 

and gene Z which shows only a 5^ d i nc "r ° Tu™ aftera P ar ^ular treatment, 
may find that historically, geL Y has often ^ ^ tHe literature 

fold by a number of unrelated sLu i» in hthr" ? 7" 1° * Up - re * ula *<< 40-60- 
appear less significant. However 7e li er J f ' ^ 5 °- f ° ,d inCrease wo "^ 

recorded as having more tha doubl Y SH ° W that g6ne Z has nev " been 
i^tM^ n o n ^Tp^ d m CXpreSS1 0n ~ whlch ™-kc your 5-fold 
increase has only been , e « £„u!T T n ° n interest '"g is if that same 5-fold 
chemicals. " neo P la *™ or following treatment with related 

Problems in using «„ differential display approach 

'^^^^^^ ^ of an easily obtamable 

a -elopmental process or ^^^^ 
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' TE^fiE**^^* - ,00 comple, 

techniques ha vc common and /or unique KchiWca I ?*. use differential display 
isolation and identification of all tho e „en~ al P r ° blems «hich preclude the 
Funhermore, there are tap^.tenrtc^L^ I T ChM,eeS » "P"**™- 
which differential expression anal^sTs^l no / /° dev<:lo P">«« 
of this is die presence of small demons iZ? ^ A " 

seen in activated oncogenes *''°" S ' ,nsert,ons . <"P°-nt mutations such as those 
morphisms. Polymo^hTviisIns s 7™ »*~ and ^dual P „" 
regarded as bein s JpJ^Z^^t USU *'* «• « 
respond better than others to cert,™ Z" Z =? ,a ' n »>8 why some patients 
some people are less affected by pot^ntXdanT ( ' * u" l0giCal CXtension . ^ 
others). The identification of" ch ^ t Tu«" ^^^^ 
polymorphisms requires the .ub.eo«emZii"Sr 7 " "^"^ ° CCUrrin * 
or TGGE to the gene of interest. CheTo^S? "^J"**. SSCP, DGGE 
to address issues such as alternatively sp] ced Lf ' " n0t designed 

abundance of mRNA is a result li^JT" t ?" WWht *" a ™™« 
stability. 01 mcrea sed transcription or increased mRNA 



Conclusions 



differentially expressed, since the> are de , " ^ ^ g en « "hich are 
demonstrate altered expression. Th^eans thaTtt " gen6S which 

previously unknown genes which mayTuTou t be * ZZ\ ^ ^ iS ° lati ° n ° f 
state or condition. At least one open s'teT (IaGF > rkerS of a P a «icular 

ehmmating the need to return to the origL^RN A .nH ^^itative, thus 

analyse to confirm the result. However the ™„W " ^ ° Ut North «n/PCR 
projects means that over the next 5-10 J. Progress of genome mapping 

will switch from open to M ° f "P 6 "— tafuse 

arrays. Arrays are easier and faster to pwrVin?™* ^T' pinic ^ DNA 
suitable for high throughput analysis anT ca „ be f , ' qUantitativ « data, are 

pathways or families of genes. 16^^^ * ^ " 
common laboratory animals comb 

"•"■th.thwiU.oonnolongerbTneceiJl? DNA te ^nology, 

genes using the technically E« dSr^'*^^^ 
main advantage (that of identify Z^nZZVf^T Thus ' the " 

likely, therefore, that their snhTre n 8 } W1 " be largel y e »dicated. It is 

less common laborator^ ^"^^ ** " Bn '** ° f *« 

such animals as zebrafish, electri ee, Jrbils ^ ' J" bef0re the genomes °' 
be sequenced. ^ gCrblls ' cra ^ sh ™* squid, for example, will 

bio£^ .main: What is the functional/ 

persistent problem is und^J^^ ^ ^'^ Bentt ? °»« 
cause or consequence of the altered s r a ^ Furrh mlIy CXpreSSed genes are ■ 
non-genotoxic carcinogens, are also mitL.n, T' ^ chemicals . as 
„i„ also he uprated L^Z^Z S^TS^ 
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- carcinogenr<reffect.m 1 lst differential display technology cannot hope to answer 
these questions, it does provide a springboard from which identi„cation^^aZ 
and functional studies can be launched. Understanding the molecular mechanism Z 
cellular responses is almost impossible without knowing the regulation and function 
of those genes and then- condition (e.g. mutated). In an abstract sense, differentia" 
display can be likened to a still photograph, showing details of a fixed momem m 
tune. Conner the Historian who knows the outcome of a battle and he plaTemem 
and condition of the troops before the battle commenced, but is asked to trTand 
deduce how the battle progressed and why it ended as it did from a few st m 
photographs-an impossible task. In order to understand the battled ehtri! 
must find out the capabilities and motivation of the soldiers and th.ir " 1St °. nSm 
officers, what the orders were and whether they ^^^^Z 
terrain, the remains of the battle and consider the effects the prevailing wTathe 
conditions exerted Likewise, if mechanistic answers are to be fortaS t 
scientist must use different*.! display in combination with other technTqueT^h a 
knockout technology, the analysis of cell signalling pathways, muftion «d^td 
time and dose response analyses. Although this review has e^phas Led the 
importance of differential gene profiling, it should not be coaeidt^TilotoSol and 
he full impact of this approach will be strengthened if used in conTba.SoTwTA 
unctional genomics and proteomics (2-dimensional protein gels fTom ^isoelectric 
focusing and subsequent SDS electrophoresis and virtual 2D-maps usTng cap Ha r^ 
electrophoresis). Proteomics is attracting much recent attentio a many of Z 
changes resulting m differential gene expression do not involve changes^n mRNA 
levels as decribed extensively herein, but rather protein-protein protefn-DNA and 
protem phosphorylation events which would require funct onTL. 
proteomic technologies for investigation. functional genomics or 

Despite the limitations of differential display technoloev it is clear th,r ™ 
potential applications and benefits can be obtained from ^ 

measurable. Amongst other things, such fingerprint, could indicate the fcmfly^ 
even specific type of chemical an individual has been exposed to plus the k ng th 
and/or acuteness of that exposure, thus indicat.ng the most prudent treatment 
They may also help uncover differences in histologically identical cancers provTde 

-T^^f - St3geS ° f " e0P,a - - . -*aps ^-hl 

The Human Genome Project will be completed early in the next century and the 
DNA sequence of all the human genes will be known. The continuing development 
and evolution of differentia] gene expression technology will ensure that this 
knowledge contributes fully to the understand^ of human disease processes. 
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SUMMARY 



The technique of differential display reverse iranscription-polymerase chain reaction (ddRT-PCR) has been used to produce unique 
profiles of up-regulated and down-regulated gene expression in the liver of male Wistar rats following short term exposure to the 
non-genot xic hepatocarcinogens, phenobarbital and WY- 14,643. Animals were treated for 3 days, whereupon their livers were 
extracted and snap frozen. mRNA was prepared from the livers and used for ddRT-PCR. Individual bands from the differential 
displays were extracted and cloned False positives were eliminated by dotblot screening and true positives then sequenced and 
identified. 



INTRODUCTION 

Safety evaluation of new chemicals usually necessi- 
tates the examination of genotoxic and carcinogenic 
potential using short-term in vitro and in vivo geno- 
toxicity assays augmented by chronic bioassay tests. 
The short-term assays have proved useful in the early 
identification of potential genotoxic carcinogens, but 
their value is limited by observations which suggest 
that approximately 60% of chemicals identified as car- 
cinogens in life-exposure studies produce mainly 
negative findings in short-term genotoxicity tests (1,2). 
Thus, there is currently no reliable and rapid means of 
evaluating the carcinogenic risk of new chemicals 
which fall into this latter group of compounds, termed 
non-genotoxic (or epigenetic) carcinogens. 



Please send reprint requests to : Dr John Rockett, Molecular 
T xicology Group, School of Biological Sciences, University 
of Surrey, Guildford, Surrey GU2 5XH, UK. 



It is now evident that non-genotoxic carcinogens 
constitute a group of chemicals which are not only di- 
vergent in their interspecies toxicity, but also demon- 
strate different target organ selectivities and mecha- 
nisms of action (3,4). Elucidation of the molecular 
mechanisms underlying non-genotoxic carcinogenesis 
is currently underway, but the picture is still far from 
complete. It is anticipated that a better understanding 
of the early changes in genetic expression following 
exposure to non-genotoxic carcinogens will aid devel- 
opment of experimental strategies to identify cellular 
markers which are diagnostic for this type of toxicity. 

Subtractive ddRT-PCR is a recently developed 
technique which facilitates the preferential amplifica- 
tion of gene products that demonstrate altered expres- 
sion in target tissue(s) following exposure to chemical - 
stimuli. Furthermore, using this technique, no prior 
knowledge of the specific genes which are up/down 
regulated is required. In the current study, we have un- 
dertaken to develop a specific and rapid assay for non- 
genotoxic carcinogens using the technique of ddRT- 
PCR. This has allowed us to identify characteristic 
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patterns of gene regulation following administration of 
"two differenf^on-gehotoxic carcinogens (phenobarbi- 
taJ and Wy-14,643) and the subsequent identification 
of individual gene species which are regulated by this 
xenobiotic treatment 



MATERIALS AND METHODS 
Animals and treatment 

Phenobarbital (BDH, Poole, UK; 100 mg/kg/day) or 
{4-chloro^(23-xylidino)-2-pyriinidinylthio] acetic 
acid (Wy-14,643) (Campo, Emmerich; 250 
mg/kg/day) was administered by gavage to groups of 
3 male Wistar rats (150-200 g) on three consecutive 
days, whilst control animals received nothing. All ani- 
mals had free access to food (rat and mouse standard 
diet, B&K Universal, Hull, UK) and water. The ani- 
mals were killed on the fourth day, whereupon their 
livers were excised, sliced into 0.5 cm cubes, snap fro- 
zen in liquid nitrogen and then stored at -70°C. 

mRNA extraction 

Up to 0.25 g of each frozen liver sample was ground 
under liquid nitrogen using a mortar and pestle. 
mRNA was extracted from the ground liver using 
Promega's PolyATtract® System 1000 (Promega, 
. Madison, WI, USA) according to the technical man- 
ual. The mRNA was DNase-treated (Promega, final 
concentration 10 U/ml) before phenol/chloroform ex- 
traction and ethanol precipitation. The mRNA was re- 
suspended at a final concentration 500-1000 ng/u.1. 

ddRT-PCR 

This was carried out using the PCR-Select™ cDNA 
Subtraction Kit (Clontech, Palo Alto, CA, USA) ac- 
cording to the manufacturer's instructions. Final PCR 
reactions were run on a 2% Metaphor agarose (FMC, 
Rockland, MD. USA) gel containing ethidium bro- 
mide (Sigma, Dorset, UK) and then overstained for 30 
min with SYBR Green I DNA stain (FMC, 1:10 000 
dilution in TAE). 



the DNA eluted using a Genelute™ Agarose Spin Col- 

551 ( ,< UP ? C °' BeIlefon ^- An aliquot of the eluted 
DNA (5 ni) was re-amplified using the original ddRT- 
PCR nested primers and electrophoresed on a 2% 
agarose gel. The re-amplified band was extracted from 
the gel (as above) and the eluted DNA ligated directlv 

r m °, k*^ IT 0 TA a ° ning ® VCCt0r OnvtaoBBi 
Carlsbad) before transformation in Escherichia coli 

TOP10F One Shot™ cells (Invitrogen). 
Stage 1 screening 

Twelve transformed (white) colonies from each band 
were grown up for 6 h in 200 ui LB broth containing 
ampicilhn (Sigma, 50 ug/ml) and 1 ul of this ampli- 
fied by PCR reaction (as specified in ddRT-PCR tech- 
nical manual). One quarter of the completed reaction 
was electrophoresed on a standard 2% agarose gel and 
one quarter on a 2% agarose gel containing HA Yel- 
low (Hanse Analytik GmbH, Bremen, Germany, 1 
U/ul) to discern the different cloning products. The re- 
mainder was used to prepare duplicate dotblots on Hy- 
bond N+ (nylon) membranes (Amersham, Little Chal- 
font, UK). Cultures containing different cloning prod- 
ucts were grown up and a plasmid miniprep prepared 
from each (Wizard Plus SV Minipreps DNA Purifica- 
tion System, Promega) according to the manufac- 
turer's instructions. 



Stage II screening 

The duplicate dotblots were probed with: (a) the final 
differential display reaction; and (b) the 'reverse-sub- 
tracted' differential display reaction. To make the 're- 
verse-subtracted' probe, the subtracuve hybridisation 
step of the ddRT-PCR procedure was carried out using 
the original tester cDNA as a driver and the driver as 
a tester. Probing and visualisation were carried out us- 
ing the ECL Direct Nucleic Acid Labelling and Detec- 
tion System (Amersham) according to the manufac- 
turer's instructions. Those clones which were positive 
for (a) but negative for (b), or showed a substantially 
larger positive signal with (a) compared to (b), were 
chosen for further analysis. 



Band extraction and cloning 

Each discernible band from the differential display 
pattern was extracted from the gel with a scalpel and 



DNA sequencing 

Positive clones as identified above were sequenced on 
an automated ABI DNA sequencer (Applied Biosys- 
tems, Warrington, UK). 



J. C. Rockett et al, Hepatocarcinogenesis aid ddRT-PCR 

B 



331 





4 5 <5 




Fig. 1 : (A) Subliacdvc ddRT-PCR patterns obtained from rat liver following 3-day treatment with WY-14.643 or phenobarbital Lane 
w \2£ I T, • Up - re ^ lated W y- 14 - M 3 ^tment; lane 3. genes downTgSaSdloUo^ 

Wy.14-643 treatment; lane 4 genes up-regulated following phenobarbital treatment; lane 5 genes down-reSed oUow^nl 
phenobarboal treatment; and lane 6. Ikb ladder. (B) Subtractive ddRT-PCR patterns obtained from nuXer showinj S 
changes when phenobarbital treated mRNA is subtracted from Wv-14,643-treated mRNA and vice-ve^llne 1 I kb 
adder, lane 2, genes showmg mcreased expression following Wy-14,643 treatment compared to phenobarWul ^ea'tnlent 
lane 3. genes showing increased expression foUowing phenobarbital treatment compared to Wy-14 643^ JeatmTnTsee' 
Materials and Methods for further details. ' treatment &ee 





Fig. 2 : Re-amplified ddRT-PCR pr ducts which were down-regulated following phenobarbital treatment (upregulated bands were also 
re-amphfied but gel n t shown). Individual DNA bands excised from gel of ddRTR-PCR reactions were extra* 
re-amphfied and run n agarose gels t confirm amplification of correct band (numbered). See Materials and Methods S 
further details. 
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Rat mRNA for 3-oxoacyl-CoA thiolase 
Rat hemopoxin mRNA 
ft rattus alpha-2u-globulin mRNA 
Af. musculus mRNA for CI inhibitor 
Rat electron transfer flavoprotein 
Mouse topoisomerase 1 fTopo 1) mRNA 
Soares 2NbMT M. musculus (EST) 
Rat alpha-2u-globulin (s-type) mRNA 
Soares mouse NML M. musculus (EST) 
Soares p3NMF19.5 M.musculus (EST) 
Soares mouse NML M. musculus (EST) 
NCI.CGAP.Pr1 H. sapiens (EST) 
ft norvegicus mRNA for ribosomal protein 
Soares mouse embryo NbME135 (EST) 
Rat fibrinogen B-beta-chain 
Rat apolipoprotein E gene 
Spares p3NMF19.5 M. musculus (EST) 
Stratagene mouse testis (EST) 
ft norvegicus RASP 1 mRNA 
Soares mouse mammary gland (EST) 



Table II : Rat liver genes up-regulated by phenobarbital treatment 



Band number 
(Approximate size in bp) 

5 (1300) 
7(1000) 

8(950) 
10(850) 

11 (600) 

12 (750) 

15 (600) 

16 (550) 
21 (350) 



Pheno barbital up- regulated 

H igh est sequence homology 



Clone 1 
Clone 2 



Clone 1 
Clon 2 



93.5% 
95.1% 

98.3% 
95.7% 
94.9% 
75.3% 
93.8% 

92.9% 

95.2% 
93.6% 
99.3% 



FASTA-EMBL gene identification 

Rat cytochrome P450IIB1 

mRNA for rat preproalbumin 

Rat serum albumin mRNA 

NCI_CGAP_Pr1 H. sapiens (EST) 

Rat cytochrome P450IIB1 

Rat cytochrome P450IIB1 

Rat cytochrome p450-L (p450IIB2) 

Rat TRPM-2 mRNA 
Rat mRNA for sulfated glycoprotein 
mRNA for rat preproalbumin 
Rat serum albumin mRNA 
Rat cytochrome P450IIB1 
Rat haptoglobulin mRNA partial alpha 
ft norvegicus genes for 18S, 5.8S & 28S rRNA 
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Identification of differentially-regulated 
genes 

" Gene-sequences were identified using the FASTA pro- 
gramme (http7/www.ebi.ac.uk/htbin/fasta.py?request) 
to search all EMBL databases for matching DNA se- 
quences. 



RESULTS 

Figure 1A,B shows the ddRT-PCR patterns of genes 
showing altered expression in rat liver following 3 day 
treatment with phenobarbital or Wy-14,643. Individual 
bands were isolated from the phenobarbitaJ-modulated 
patterns (both up- and down-regulated), re-amplified 
(Fig. 2), cloned, screened for false positives and then 
identified. Those xenobiotic-modulated gene products 
identified to date are listed in Tables I and II. 



DISCUSSION 

The advent of combinatorial chemistry has led to the 
synthesis of millions of new chemical compounds, 
many of which may be potentially useful in pharma- 
ceutical, agricultural or industrial applications. How- 
ever, whilst there are tests available for those posing a 
genotoxic activity, there remains no short-term assay 
able to identify those chemicals which may belong to 
the non-genotoxic group of carcinogens. 

We have used an adaptation of the subtractive hy- 
bridisation method - ddRT-PCR - to produce charac- 
teristic profiles or 'fingerprints' of those genes which 
are up-regulated or down -regulated in male rat liver 
following acute exposure to test chemicals. The ddRT- 
PCR profiles are characteristic and unique for each of 
the 2 compounds studied to date. 

A number of those gene species showing altered 
expression following phenobarbital treatment have 
been cloned and identified (Tables I & II). It is inter- 
esting to note the presence of CYP2B2 in the up-regu- 
lated genes. This would, of course, be expected fol- 
lowing exposure to phenobarbital and serves as a posi- 
tive control for the method. Other genes which one 
might normally expect to be up-regulated do not ap- 
pear in the table. However, it should be noted that not 
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all bands seen on the differential display were ex- 
tracted and re-amplified due to their being too faint or 
too close to other bands to accurately excise. Further- 
more, it has been well documented [(5) and references 
therein] that a single band extracted from a differential 
display often represents a composite of heterogeneous 
products. We are currently examining new methods to: 
(0 improve resolution of the differential display pat- 
terns (including 2-D agarose gels); and (ii) distinguish 
those ddRT-PCR products which are identical in size, 
but different in sequence. 

Our future efforts will be directed towards deter- 
mining the extent of modulation of a number of the 
genes reported herein using semi-quantitative RT- 
PCR. This should reveal the extent of changes in ex- 
pression of key gene products which may be involved 
in non-genotoxic hepatocarcinogenesis and thus help 
increase understanding of this process. Furthermore, it 
is anticipated that aligning ddRT-PCR profiles of dif- 
ferent non-genotoxic agents found in responsive and 
non-responsive species may enable identification of 
those genes which are mechanistically relevant to the 
non-genotoxic hepatocarcinogenic process. Accord- 
ingly, this approach lends itself well to the identifica- 
tion, characterisation and sub-classification of possible 
different classes of non-genotoxic carcinogens. 
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Use of suppression-PCR subtractive hybridisation to identify 
genes that demonstrate altered expression in male rat and 
guinea pig livers following exposure to Wy- 14,643, a 
peroxisome proliferator and non-genotoxic hepatocarcinogen 
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Abstract 

Understanding the genetic profile of a cell at all stages of normal and carcinogenic development should provide an 
essential aid to developing new strategies for the prevention, early detection, diagnosis and treatment of cancers. We 
have attempted to identify some of the genes that may be involved in peroxisome-proliferator (PP)-induced 
non-genotoxic hepatocarcinogenesis using suppression PCR subtractive hybridisation (SSH). Wistar rats (male) were 
chosen as a representative susceptible species and Duncan-Hartley guinea pigs (male) as a resistant species to the 
hepatocarcinogenic effects of the PP, [4«chloro-6-(2,3-xylidino)-2-pyrimidinylthio] acetic acid (Wy-1 4,643). In each 
case, groups of four test animals were administered a single dose of Wy. 14,643 (250 mg/kg per day in corn oil) by 
gastric intubation for 3 consecutive days. The control animals received corn oil only. On the fourth day the animals 
were killed and liver mRNA extracted. SSH was carried out using mRNA extracted from the rat and guinea pig 
livers, and used to isolate genes that were up and downregulated following Wy-1 4,643 treatment. These genes 
included some predictable (and hence positive control) species such as CYP4A1 and CYP2C11 (upregulated and 
downregulated in rat liver, respectively). Several genes that may be implicated in hepatocarcinogenesis have also been 
identified, as have some unidentified species. This work thus provides a starting point for developing a molecular 
profile of the early effects of a non-genotoxic carcinogen in sensitive and resistant species that could ultimately lead 
to a short-term assay for this type of toxicity. © 2000 Elsevier Science Ireland Ltd. All rights reserved. 

Keywords: Wy-14,643; Peroxisome proliferator; Non-genotoxic hepatocarcinogenesis; Suppression PCR subtractive hybridisation; 
RT-PCR; Rat; Guinea pig; Gene regulation; Differential gene display; Gene profiling 
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Introduction 

The advent of combinatorial chemistry and 
>mputer-aided drug design has led to a recent 
Dsurge in the number of chemical compounds 
iat have potential therapeutic, agricultural and 
dustrial applications. Although it has been sug- 
:sted that the contribution of synthetic chemicals 
the overall incidence of human cancer is low, 
ere still remains an absolute requirement to 
aluate all new chemicals for toxic and carcino- 
nic potential. The latter is one of the most 
oblematic areas of chemical safety evaluation 
d is usually carried out using short-term in vitro 
id in vivo genotoxicity assays augmented by 
ronic bioassay tests. The short-term assays have 
oved useful in the early identification of poten- 
1 genotoxic carcinogens, but their value is lim- 
d by observations that suggest that 
proximately 60% of chemicals identified as car- 
ogens in life-exposure studies produce mainly 
jative findings in short-term genotoxcity tests 
shby, 1992; Parodi, 1992). Thus, there is cur- 
Uly no reliable and rapid means of evaluating 
carcinogenic risk of new chemicals that fall 
:> this latter group of compounds, termed non- 
sotoxic (or epigenetic) carcinogens. 
)ne approach to addressing this problem is to 
-idate the molecular mechanisms by which 
>wn non-genotoxic carcinogens act. It should 
n be possible to identify common factors/ 
:hanisms that can serve as early biomarkers of 
rinogenic potential for new chemicals. To this 
, a large number of groups have reported on 
various effects of non-genotoxic compounds 
various animal species (Marsman et al., 1988; 
:e et al., 1993; Cattley et al, 1994; Hayashi et 
1994; Human and Experimental Toxicology, 
4; Anderson et al., 1996). However, the mech- 
tic picture is still far from complete with many 
hose genes involved in the carcinogenic pro- 
remaining unknown, and their identification 
efore remains a key goal in elucidating the 
ocular mechanisms by which non-genotoxic 
inogenesis occurs. 

lbtractive hybridisation (SH) and related tech- 
gies such as representational difference analy- 
(RDA) (Hubank and Schatz, 1994) and 



differential display (DD) "(Liang and Pardee 
1992) can be used to aid the isolation of genej 
showing altered expression in target tissues fol- 
lowing exposure to a chemical stimulus. These 
techniques can also be used to identify differential 
gene expression in neoplastic and normal cells 
(Liang et al., 1992), infected and normal cells 
(Duguid and Dinauer, 1990), differentiated and 
undifferentiated cells (Sargent and Dawid, 1983- 
Guimaraes et al., 1995), activated and dormant 
cells (Gurskaya et al., 1996; Wan et al., 1996) 
different cell types (Hedrick et al., 1984; Davis et 
al., 1984) amongst others. Most importantly, us- 
ing such approaches, no prior knowledge of the 
specific genes that are upregulated/downregulated 
is required. 

Using a variation of SH, termed suppression- 
PCR subtractive hybridisation (SSH) (Diatchenko 
et al., 1996), we have previously reported the 
isolation of a number of genes showing altered 
expression in male rat liver following acute expo- 
sure to phenobarbital (Rockett et al., 1997). In 
the current work we have used the same experi- 
mental approach to isolate genes that are differen- 
tially expressed in the livers of male rats and 
guinea pigs following short-term (3-day) exposure 
to the peroxisome proliferator (PP) and non- 
genotoxic hepatocarcinogen, Wy- 14,643. We have 
isolated and identified a number of gene species, 
some of which may be important in the induction 
of, or protection against, non-genotoxic 
hepatocarcinogenesis. 



2. Materials and methods 

2.1. Animals and treatment 

All animal experiments were undertaken in ac- 
cordance with Her Majesty's Home Office De- 
partment guidelines under the auspices of 
approved personal and project licences. Male 
Wistar rats (150-200 g) and male Duncan -Hart- 
ley guinea pigs (250-300 g) were obtained from 
Kingman and Bantam (Hull, UK). Upon receipt, 
both groups were randomly assigned into two 
groups of four. They were maintained on a rat, 
mouse or guinea pig standard diet (B&K Univer- 
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sal, Hull) and a daily cycle of alternating 12-h 
_ ^periods of_dark_and light Jhe. roQnL.temperature 
was maintained at 19°C and a relative humidity of 
55%. The animals were acclimatised to this envi- 
. ronment for 7 days before treatment commenced. 
[4-chloro-6-(2,3-xylidino)-2-pyrimidinylthio] acetic 
acid (Wy-14,643, Campo, Emmerich; 250 mg/kg 
per day in corn oil) was administered by gavage 
to the treated groups of rats and guinea pigs on 3 
consecutive days, whilst control groups received 
an equal volume of corn oil only. During this 
time, all animals had free access to food and 
water. The animals were killed by cervical disloca- 
tion on the fourth day, and their livers immedi- 
ately excised, weighed, sliced into approximately 
0.5-cm cubes, snap frozen in liquid nitrogen and 
stored at - 70°C. 

2.2. mRNA extraction 

Approximately 0.25 g of each frozen liver sam- 
ple was ground under liquid nitrogen using a 
mortar and pestle. Messenger RNA was extracted 
from the ground liver using the PolyATtract® 
System 1000 kit (Promega, Madison, USA) ac- 
cording to the technical manual provided by the 
manufacturers. The mRNA was DNase-treated 
(RQ Rnase-free Dnase, Promega, final concentra- 
tion 10 U/ml) before phenol/chloroform extrac- 
tion and ethanol precipitation. The mRNA was 
redissolved at a final concentration 500-1000 ng/ 
ul. 

2.3. cDNA Subtraction 

This was carried out using the PCR-Select™ 
cDNA Subtraction Kit (Clontech, Palo Alto, 
USA) according to the manufacturer's instruc- 
tions. Subtractions were carried out with mRNAs 
derived from single animals. The mRNA from the 
remaining three animals in each group was later 
used for quantitative RT-PCR analysis of specific 
genes. 

2.4. Band extraction and cloning 

The secondary PCR reactions from the cDNA 
subtraction procedure were run on a 2% 



Metaphor agarose gel (FMC, Rockland, USA) 
containing- 0.5 ug/ml ethidium bromide (Sigma 
Dorset, UK). One timerTAE (0.04 M Tris-ac- 
etate, 0.001 M EDTA) was used to prepare the gel 
and as the running buffer. After running for 6-7 
h at 3.75 V/cm, the gel was overstated for 30 rnin 
with SYBR Green I DNA stain (FMC, 1:10000 
dilution in 1 x TAE). Each discernible band of 
the differential display pattern was extracted from 
the gel with a scalpel and the DNA eluted using a 
Genelute™ agarose spin column (Supelco, Belle- 
fonte, USA). Five microlitres of the eluted DNA 
was reamplified using the original nested (sec- 
ondary) PCR primers supplied with the PCR-Se- 
lect™ cDNA subtraction kit. The PCR products 
were electrophoresed on a 2% standard agarose 
gel (Boehringer Mannheim, East Sussex, UK) and 
the reamplified target bands extracted from the 
gel as above. The eluted DNA was immediately 
ligated into a TOPO TA Cloning* vector (Invitro- 
gen, Carlsbad, USA) before transformation in 
Escherichia coli TOPI OF' One Shot™ cells 
(Invitrogen). 

2.5. Colony screening 

2.5.1. Stage I 

Eight transformed (white) colonies from each 
band were grown up for 6 h in 200 ul LB broth 
containing ampicillin (Sigma, 50 mg/ml). One mi- 
crolitre of this was subjected to PCR using the 
same conditions and nested primers as described 
above. One tenth (2 ul) of the completed PCR 
reaction was electrophoresed on a 2% standard 
agarose gel and one tenth on a 2% standard 
agarose gel containing HA red (Hanse Analytik 
GmbH, Bremen, Germany, 1 U/ml) to discern the 
differentially cloned products. The remainder of 
the PCR reaction was used to prepare duplicate 
dotblots on Hybond N + membranes (Amersham 
Little Chalfont, UK). 

2.52. Stage II 

The duplicate dotblots were probed with (a) the 
final differential display reaction and (b) the 're- 
verse-subtracted' differential display reaction. To 
make the 'reverse-subtracted' probe, the subtrac- 
tive hybridisation step of the differential display 
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3T-PCR procedure was carried out using the 
original tester (treated) mRNA as the driver and 
he original driver (control) mRNA as the tester. 
D robing and visualisation were carried out using 
he ECL direct nucleic acid labelling and detec- 
ion system (Amersham, Little Chalfont, UK) ac- 
;ording to the manufacturer's instructions. Those 
:lones that were positive for (a) but negative for 
b) f or showed a substantially larger positive sig- 
ial with (a) compared to (b), were selected for 
)NA sequence analysis. 

\6. DNA sequencing 

The remainder of the cultures (prepared in 
tage I screening) containing different cloning 
roducts (as discerned in the two screening steps) 
/ere grown up overnight in 5 ml LB broth con- 
aining ampicillin (50 mg/ml). A plasmid miniprep 
as prepared from each (Wizard Plus SV 
linipreps DNA purification system, Promega) 
xording to the manufacturer's instructions. The 
ioned inserts were sequenced on an automated 
.BI DNA sequencer (Applied Biosystems, War- 
ngton, UK) using the Ml 3 forward primer 
3TAAAACGACGGCCAGT) or M13 reverse 
rimer (AACAGCTATGACCATG). 

7. Identification of differentially regulated genes 

Gene sequences thus obtained were identified 
ung the FASTA 3.0 programme (Lipman and 
parson, 1985; Pearson and Lipman, 1988) (http:/ 
/ww.ddbj.nig.ac.jp/E-mail/homology.html) to 
arch all EMBL databases for matching DNA 
quences. Each clone sequence was submitted in 
e forward and reverse direction, and the one 
turning the highest statistical probability of 
atch to a known sequence was noted. Sequence 
>mologies between our submitted clone sequence 
d the queried database sequence were deter- 
ned (by FASTA) over a region of at least 60 
se pairs. 

I RT-PCR analysis of selected candidate genes 

:DNA sequences of the target genes were ob- 
ned from the NIH gene database (GenBank at 
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http://www.ncbi.nlm.nih.govyWeb/Search/index. 
html) and the computer programme gene 
jockey (BioSoft, Cambridge, UK) used to select 
primer pairs from these sequences. Where guinea 
pig sequences were available, rat and guinea pig 
sequences were aligned and primers chosen from 
regions of homology. If guinea pig sequences were 
not available, rat and human sequences were 
used. In cases where exact homology could not be 
found, the sequence from the rat was used. In the 
case of CD81 only, no rat or guinea pig sequences 
were available and so mouse and human se- 
quences were aligned and a primer pair chosen 
from a region of homology. Primers (obtained 
from Gibco-BRL, Paisley, UK) were dissolved at 
a concentration of 50 pmol/^1 in sterile distilled 
water and stored at - 20°C. The primer pairs 
used plus other reaction parameters are shown in 
Table 1. mRNA was extracted (as described 
above) from all four treated animals and from 
three animals in the control group. Integrity of 
the eluted mRNA was confirmed on a 2% agarose 
gel, and the concentration and purity were mea- 
sured using a Genequant II spectrophotometer 
(LKB, Bromma, Sweden) and then diluted to 10 
ng/nl. One microlitre of this latter solution was 
used per RT-PCR reaction. 

RT-PCR was carried out in a single tube (50 
reaction using the Access RT-PCR system 
(Promega) according to manufacturer's instruc- 
tions. In the kinetic and quantitative analyses, 
omission of RNA was used as a control for the 
presence of any contaminating DNA. After ob- 
taining a PCR signal of the correct size and 
optimising the reaction conditions, each PCR 
product was digested with between two and four 
separate restriction enzymes. Specific restriction 
patterns were thus obtained, which further confi- 
rmed the identity of the PCR products as being 
the original target genes. Kinetic analysis (14-32 
cycles) was then performed in each case to deter- 
mine the location of the mid-log phase. 

For the semi-quantitative analysis of each 
target gene, RT-PCR reactions were carried out in 
triplicate for each sample to reduce the effect of 
intertube RT-reaction variations (Kolls et ah, 
1993) and pipetting errors. For each gene, a mas- 
termix containing enough reagents for three times 
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the number of samples (seven for rat, six for 
guinea pig) was prepared except that mRNA was 
omitted, the latter being added after aliquoting 49 
\i\ of the masteimix into an appropriate number 
of tubes. Amplification of albumin (the reference 
gene) was carried out in separate tubes since the 
mid-log phase of this gene is at a much lower 
cycle number than the target genes due to its high 
abundance. All RT-PCR products were analysed 
on 2% agarose gels containing 0.5 ng/ml ethidium 
bromide. The target gene samples were loaded on 
the gel first and run in at 3 V/cm for 10 min. The 
corresponding albumin samples were then loaded 
and the gel run for a further 1/2 h. In this way, all 



RT-PCR products from-each target gene and 
albumin from the corresponding samples could be 
run on the same gel. Gels were photographed 
using type 665 posi-neg film (Sigma) and quanti- 
tation of the band intensity was carried out using 
a dual wavelength flying spot laser scanner densit- 
ometer (Shimadzu). 

2 9. Statistical analysis 

Statistical analysis of unpaired samples was car- 
ried out using the two-tailed Student's r-test. Val- 
ues were considered statistically significant at 
P < 0.05 or less. 



L 1 2 



L 1 2 



3. Results 



8 it 



A B 

Fig. 1. Final displays of differentially expressed genes that 
*ere (1) upregulated and (2) downregulated in rat (A) and 
:uinea pig (B) livers following 3-day treatment with Wy- 
4.643. mRNA extracted from control and treated livers was 
ised to generate the differential displays using the PCR-Select 
DNA subtraction kit (Clontech). Lane (L) is a I Kb DNA 
-adder standard and 10 ul of secondary PCR reaction were 
jaded in all other lanes. 



3.1. Cloning and screening of transcripts 

For both the rat and guinea pig experimental 
groups, cDNA subtraction was carried out in the 
forward (control driving tester) and reverse (tester 
driving control) directions to isolate both upregu- 
lated and downregulated mRNA species respec- 
tively. Using a standard primary hybridisation 
time of 8 h we obtained a substantial amount of 
non-specific products in all the final differential 
displays (data not shown). This background 
smearing was almost completely removed by re- 
ducing the primary hybridisation time to 4 h 
(CLONTECHniques, 1996). Fig. 1 shows the 
ddRT-PCR patterns of genes showing altered ex- 
pression in rat and guinea pig liver following 
3-day treatment with Wy- 14,643. The profiles are 
unique for each species, and in each case the 
profile for the upregulated genes (control mRNA 
driving tester mRNA) is different to that obtained 
for the downregulated genes (tester mRNA driv- 
ing control mRNA). 

The practical outcome of the SSH method is 
that a series of differentially expressed genes is 
observed as a ladder on an agarose gel. The 
majority of these gene fragments fall within the 
150-2000 bp range, with bands up to 5 Kbp 
occasionally being observed. Each band may the- 
oretically consist of one or more products of 
similar size, as the gel has a maximum resolution 



y.C Rockeu et al^/Tc 




Fig. 2. Discrimination of different ddRT-PCR products having 
the same molecular size using HA-red. GeJ (A) is a 2% 
standard agarose gel Gel (B) is a 2% standard agarose gel 
containing 1 U/ml HA-red. Band numbers refer to the sequen- 
tial bands (largest to smallest) extracted from the original 
display of genes upregulated in rat liver following 3-day treat- 
ment with Wy-14,643. Ten micorlitres of each PCR reaction 
were loaded per lane. 



of approximately 1.5% (3 bp per 200). In addi- 
tion, there may be two or more products that are 
the same size, but have a different sequence. 
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Therefore some form of discrimination must be 
employed to isolate as many of these products as 
possible? HA-red screening (Geisinger et al., 1997) 
of a number of clones derived from each band 
provided a means to discriminate between differ- 
ent gene species of the same size. A typical exam- 
ple of such a gel is shown in Fig. 2. In total, 88 
and 48 apparently different clones were obtained 
from the final differential expression patterns of 
upregulated and downregulated rat genes, respec- 
tively. Sixty nine and 89 apparently different 
clones were obtained from the final differential 
expression patterns of the upregulated and down- 
regulated guinea pig genes, respectively. 

Having identified as many different candidate 
gene products as possible in the screening step I, a 
second screening step was carried out on every 
clone to confirm those that represented true dif- 
ferentially expressed genes. This is necessary since 
no subtraction technique is 100% efficient. The 
approach we used, termed PCR-select differential 
screening (as recommended in Clontech's PCR-se- 
lect cDNA subtraction kit protocol), utilises the 
forward and reverse subtractions as an aid to 
screening for the true differentially expressed 
genes (CLONTECHniques, 1997). Because these 
probes have already undergone subtraction, they 
have been enriched for differentially expressed 
genes and are therefore more sensitive than un- 
subtracted driver/tester cDNA probes for detect- 
ing true differential expression. All the clones that 
were isolated from each display were dotblotted 
and probed with the display from which they was 
obtained, plus the corresponding reverse-sub- 
tracted display. An example of such a blot is 
shown in Fig. 3. Clones corresponding to authen- 
tic differentially expressed mRNAs hybridised 
with the subtracted cDNA probe, but not the 
reverse-subtracted probe. We also included in the 
authentic positives, those clones that gave a sub- 
stantially greater signal with the subtracted probe 
compared to the reverse-subtracted probe. False 
positives hybridised with either both probes or 
with neither probe. Of the original 88 upregulated 
and 48 downregulated rat clones selected for this 
screening step, 28 (32%) and 15 (31%) respec- 
tively, were found to be true positives. In the rat, 
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8 (100%) of the true positive upregulated genes 
Table 2) and 1 1 (73%) of the true positive down- 
egulated genes (Table 3) were non-redundant. Of 
ae original 69 upregulated and 89 downregulated 
uinea pig clones selected for this screening step, 
8 (70%) and 37 (42%) respectively, were found to 
e true positives. Thirty six (75%) of the upregu- 
tted genes (Table 4) and 33 (89%) of the down- 
.gulated genes (Table 5) were non-redundant. 

2 Identification of clones 

On sequence analysis it was found that some 
ones were unsequencable in the first instance 
113 forward primer) due to long polyA runs 
at appeared to prematurely terminate the se- 
lencing reaction. These clones were therefore 
sequenced from the opposite direction using the 
13 reverse primer. Those xenobiotic-modulated 
ne products identified to date are listed in Ta- 
;s 2 and 3 (rat) and Tables 4 and 5 (guinea pig). 




3. Dot blots of cl nes of putative upregulated gene species 
ed from guinea pig liver following 3-day treatment with 
4,643. All clones identified in the stage I screening step 
nethods) were blotted and probed with (A) the differen- 
Jisplay from which they originated (control driving 
d) and (B) the reverse subtraction (treated driving con- 
Arrows indicate some of the true differentially expressed 



Table 2 



Identification of genes that were upregulated in male rat liver 
following 3-day treatment with WY-14,643- 



FASTA-EMBL gene 
identification (rat un- 
less otherwise stated) 



Accession No. 



Sequence 
homology* (%) 



Carnitine octanoyl 

transferase 
NCI.CGAP.Lil (H. 

sapiens) (EST**) 
Peroxisomal enoyl 
hydratase-Iike 
protein 
Liver fatty acid bind- 
ing protein 
Soares mouse 
P3NMF19.5 M. 
musculus cDNA 
clone 
Cytochrome 
P450IVA1 
Mit. 3-hydroxyl-3- 
methylglutaryl 
CoA synthase 
Rabgeranylgeranyl 
transferase compo- 
nent B 
Genes for 18S, 5.8S, 
and 28S ribosomal 
RNAs 
Carnitine acetyl 

transferase (mouse) 
Soares mouse NML 

(EST) 
Bone marrow stromal 
fibroblast (H. sapi- 
ens) cDNA clone 
HBMSF2E4 (EST) 
7.5dpc embryo 

(mouse) (EST) 
Alpha- 1 -macroglobu 
Un 

Transferrin 
Lecithinxholesterol 

acyltransferase 
Zn-a2-glycoprotein 
Serum albumin 
Fructose- 1,6-bisphos- 

phate 1-phospho- 

hydrolase 
Soares mouse 

melanoma (EST) 

(S c ) 

Soares mouse 
3NbMS (EST) 
(AS C ) 



RN26033 
HS 1275949 
RN08976 

V0I235 
AA038051 



RNRRNA 



AA408192 

RNALPH1M 

RNTRANSA 
RNU62803 

RNZA2GA 
RNJALBM 
RNFBP 

A A 124706 

A A 154039 



99 
98 
98 

96 
96 



RNCYPLA 94 
RNHMGCOA 94 

RNRABGERA 94 



94 



MMRNACAR 92 
MM1157113 92 
AA545726 92 



92 

91 

91 
90 

90 
89 



88 
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Table 2 (Continued) 



FASTA-EMBL gene 
identification (rat un- 
less otherwise stated) 



Accession No. 



Sequence 
homology* (%) 



17-P-hydroxsteroid de- RN17BHDT2 87 

hydrogenase 
Soares mouse AA038051 87 

p3NMFI9.5 (EST) 
Peroxisomal enoyl- RNPECOA 85 

CoA:hydratase -3- 

hydroxyacyl CoA 

bifunctional enzyme 
Integral membrane S450I2 8] 

protein, TAPA-1 

(CD81) (mouse) 
Soares m use lymph MMAA88445 81 

n de (EST) 
H. sapiens (clone L40401 76 

zapl28) mRNA 
Lysophospholipase ho- HSU67963 76 

mologue (human) 
Soares mouse lymph AA2J7044 74 

node (EST) 

0 Refers to the nucleotide sequence homology between the 
cl ned band isolated from the differential display and the 
c rresp nding gene derived from the EMBL gene sequence 
bank. 

b EST is 'expressed sequence tag* — a gene of as yet 
unknown identity and function. 

c Where sequence homologies were equal in both directions 
of the isolated band, both the sense (S) and antisense (A) 
identities are given. 



In all cases, both the forward and reverse se- 
quence of the target clones were analysed and the 
gene having the highest statistical homology 
noted. 

3.3. RT-PCR analysis of selected clones 

The results of a typical RT-PCR semi-quantita- 
tion experiment for transferrin in the rat is given 
in Fig. 4 and the results for a total of 12 selected 
genes in both the rat and guinea pig are shown in 
Table 6. 



Table 3 

Identification of genes that were downregulated in male rat 
liver following 3-day treatment with Wy- 14,643 



FAST-EMBL gene 
identification (rat un- 
less otherwise stated) 



Accession N 



Sequence 
horn logy* {%) 



NCI_CGAP_Lil (H. 
sapiens) (EST*)^) 

NCI_CGAP_Prl (H. 
sapiens) (EST)(AS C ) 

UDP-glucuronosyl- 
transf erase 
(UGT2B12) 

Complement compo- 
nent c3 

Soares mouse pla- 
centa (S) 

Ape (chimpanzee) 28S 
rRNA (AS) 

Rat CYP2C11 

Ribosomal protein S5 

Transthyretin 

Contrapsin-like 
protease inhibitor 

Prostaglandin F2a (S) 

ft-2-microglobulin 
(AS) 

Apolipoprotein C-III 
Parathymosin-alpha 

(zinc2 + -binding 

protein) 



AA484528 
AA469320 
RN06273 

RNC3 

AA023305 

PTRGMC 

RNCYPM1 
RNRPS5 
RNTTHY 
RNCCP23 

RN26663 
RNB2MR 

RNAPOA02 
RN112NBP 



99 
99 

98. 

96 

96 

96 

95 
94 
94 
89 

84 
84 

82 
75 



a Refers to the nucleotide sequence homology between the 
cloned band isolated from the differential display and the 
corresponding gene derived from the EMBL gene sequence 
bank. 

b EST is 'expressed sequence tag' — a gene of as yet 
unknown identity and function. 

c Where sequence homologies were equal in both directions, 
both the sense (S) and antisense (A) identities are given. 



4. Discussion 

It is now apparent that all cancers arise from 
accumulated genetic changes within the cell. Al- 
though documenting and explaining these changes 
presents a formidable obstacle to understanding 
the different mechanisms of carcinogenesis, the 
experimental methodology is now available to 
begin attempting this difficult challenge. In order 
to begin the elucidation of the molecular mecha- 
nisms involved in non-genotoxic hepatocarcino- 
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enesis, we have used SSH to identify a number of 
enes that are upregulated or downregulated in 
iale rat and guinea pig livers following short 
:rm exposure to the PP, Wy-14,643. We have 
sed the rat model to represent a species suscepti- 
ie to the non-genotoxic carcinogenic effect of 
Ps and the guinea pig as a resistant species 
)rton et al., 1984; Rodricks and Turnbull, 1987; 



Lake et al., 1989; Makowska et aL, 1992; Lake et 
al., 1993). 

Gurskaya et al. (1996), who originally devel- 
oped the SSH technique, cloned the products of 
the secondary PCR reaction and screened a small 
number of randomly selected colonies for differ- 
entially expressed clones using northern hybridisa- 
tion. However, we decided against this approach 



ib!e4 

entification of genes that were upregulated in male guinea pig liver following 3 -day treatment with WY- 14,643 

\STA-EMBL gene identification (guinea pig unless otherwise stated) Accession No. Sequence 

homology" (%) 



irboxylesterase 


ABO 10634 


97 


implement C3 pr tein (GPC3) 


M34054 


97 


tosolic aldehyde dehydrogenase (sheep) 


U12761 


92 


ttalase (human) 


X04076 


89 


itochondrial aspartate aminotransferase (pig) 


M11732 


89 
88 


ongation factor- 1 -alpha (rabbit) 


X62245 


n_CGAP_Br2 H. sapiens cDNA clone (EST) (Similar to chick mit. phosphoenolpyru- 


AA587436 


87 


vate carboxykinase) 




pha-l-antiproteinase S 


M57270 


83 


-formyltetrahydrofolate dehydrogenase (rat) 


M59861 


83 


bosomal protein L6 (rat) 


X87107 


83 


ares pregnant uterus Nb (EST) (mouse) 


A A 156847 


83 


toch ndrial citrate transport protein (human) 


L77567 


80 


toplasmic chaperonin hTRiCS (human) 


U17104 


80 


3ha-l-antiproteinase F 


M57271 


77 


terogeneous nuclear ribonuclearprotein cl/c2 (human) 


D28382 


77 


ares parathyroid tumour (EST) (similar to human serum albumin precursor) 


AA860651 


76 


atagene m use kidney (EST) 


AA 107327 


75 


ires parathyroid tumour NbHPA human cDNA (EST) 


AA860653 


74 


ires mouse mammary gland (EST) 


AA6 19297 


74 


)NA clone 15 004 (EST) (human) 


H01826 


74 


ares senescent fibroblasts (EST) (mouse) 


W52190 


74 


•proalbumin (human) 


E04315 


72 


>NA clone 73 169 (EST) (human) 


T56624 


72 


amin D-binding protein (human) 


L10641 


71 


oH gene (exon 8) (human) 


Y 11498 


71 


RL flow sorted chromosome 


B05457 


71 


tres foetal liver spleen (EST) (mouse) 


AA009524 


71 


ires foetal heart NbMH19W (EST) (mouse) 


AA009421 


69 


ires foetal heart NbHH19W H. sapiens cDNA clone (EST) 


W94377 


67 


mylalanine hydr xylase (human) 


U49897 


67 


line-5-carboxylate dehydrogenase (human) 


U24266 


66 


itathione-5-transferase hom logue (human) 


U90313 


65 


I.CGAP.GCBI (EST) (human) 


AA769294 


65 


tective proiein (human) 


M22960 


64 


ne 27 375 (EST) (human) 


N37046 


62 


itagene colon ( # 937 204) H. sapiens cDNA clone (EST) 


AA149777 


62 



Refers to the nucleotide sequence homology between the cloned band isolated from the differential display and the corresp nd- 
gene derived from the EMBL gene sequence bank. 
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Tabic 5 

. Identification of genes that were downregulated in male guinea 
— pig- liver following -3-day treatment with WY-H643 - 



FASTA-EMBL gene Accession No. Sequence 
- identification (guinea homology" (%) 

pig unless otherwise 
stated) 



i~uinpjcnicni 


JV1 j*H/J*t 


yl 


protein 






Murinoglobulin 


D84339 


95 


Alpna-l-an- 


M572/I 


88 


tiproteinase F 






Elongation factor-al- 


X62245 


89 


pha-I (rabbit) 






Coupling protein G 


VA A A Art 

X04409 


88 


(human) 






NCI_CGAP_Ovl 


AA586309 


87 


(EST 6 ) (human) 






Lecithinxholesterol 


D13668 


85 


acetyl transferase 






(rabbit) 






Aldolase B (human) 


X00270 


84 


Anti-thrombin III 


E00116 


80 


(human) 






Phenylalanine hy- 


K03020 


80 


droxylase (human) 






Inter-a-trypsin in- 


D38595 


79 


hibitor (human) 






Normalised rat mus- 


AA 849753 


78 


cle (EST) (S c ) 






Normalised rat ovary 


AA801059 


78 


(EST) (AS C ) 






Complement factor 


VAAt0 4 

X00284 


77 


Ba fragment (hu- 






man) 






Dihydrodiol dehydro- 


U05598 


76 


genase (human) 






Spot 14 gene (thyroid- 


Y08409 


75 


inducible hepatic 






protein )(human) 






BAC clone 174pl2 


AC004236 


75 


(human) 






Mitochondrial alde- 


X05409 


74 


hyde dehydroge- 






nase (human) 






Preproalbumin (hu- 


E04315 


74 


man) 






NCI_CGAP_Pr9 


AA533I42 


74 


(EST) (human) (S) 






Normalised rat pla- 


AA851197 


74 


centa (EST) (AS) 






Heparin sulfate pro- 


J 04621 


73 


teoglycan (human) 






cDNA clone 33 992 


R24330 


73 



(EST) (human) 
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Table 5 (Continued) 

FASTA-EMkL gene Accession No. Sequence 
identification (guinea homology* (%) 

pig unless otherwise 
stated) 



Retinol dehydrogenase U33501 71 
(rat) 

TAPA-1 integral mem- S45012 71 

brane protein 

(CD81) (mouse) 
Complement compo- M35525 70 

nent c5s 

Apolipoprotein B (pig) L 11 235 69 
cDNA clone 143 918 R76742 68 

(EST) (human) 
a-fibrinogen (human) K02569 68 
Soares foetal liver W03726 68 

spleen INF (mouse) 
Barstead bowel (EST) AA232049 67 

(mouse) 

UDP glucuronosyl AF0309137 66 

transferase (cat) 
Myeloid leukaemia ceil L08246 65 

differentiation 

protein (MCL-1) 

(human) (S) 
STS SHGC-34 987 (hu-G27984 65 

man) (AS) 

Soares mouse AA222798 64 

3NME125 

Stratagene mouse em- AA 199420 64 

bryonic (EST) (S) 
Rad 52 (mouse) AF004854 63 



a Refers to the nucleotide sequence homology between the 
cloned band isolated from the differential display and the 
corresponding gene derived from the EMBL gene sequence 
bank. 

b EST is 'expressed sequence tag' — a gene of as yet 
unknown identity and function 

c Where sequence homologies were equal in both directions, 
boththe sense (S) and antisense (A) identities are given. 

for several reasons: (1) the kinetics of ligation and 
transformation favour the isolation of smaller 
PCR products, thereby producing a misrepresen- 
tation of larger gene products; (2) northern blot 
analysis is notoriously insensitive and is unlikely 
to confirm expression of rare transcripts; (3) there 
is no measurable end point to the screening of 
clones produced in this way other than to analyse 
every transformed colony. We used instead an 
alternative approach; after running out the differ- 
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entiaJ display on a high-resolution agarose gel 
(Fig. 1) and overstating with SYBR Green I to 
enhance visualisation, the composite bands were 
individually extracted, reamplified and cloned. 
However, it has been well documented that single 
bands from differential displays often contain a 
heterogeneous mixture of different products 
(Mathieu-Daude et al., 1996; Smith et al., 1997). 
This is because polyacrylamide gels cannot dis- 
criminate between DNA sequences that differ in 
size by less than about 0.2% (Sambrook et al., 
1989). High-resolution agarose gels such as those 
ised in this work are even less sensitive, normally 
)nly discriminating products that differ in size by 
it least 1.5%. The use of the HA-red screening 
tep enables resolution of identical or nearly iden- 
ical sequences based on their AT content (Wawer 
t al., 1995) and is sensitive down to < 1% differ- 
nce. Furthermore, it is rapid, technically simple 
nd does not require the use of radiolabels. 
jeisinger et al. (1997) originally demonstrated the 
usefulness of using HA-red to identify different 
products cloned from the same band of an RNA 
ifferential display experiment by simultaneously 
-inning them in normal agarose (to discriminate 
y size) and in normal agarose containing HA-red 
o discriminate by AT content). We have found 
mt this approach is equally useful for identifying 
Ifferent gene species cloned from the same band 
f our SSH display. 

Diatchenko et al. (1996) reported that SSH is 
ghly efficient at producing differentially ex- 
essed gene species. However, we also included a 
cond screening step to further confirm that the 
ones isolated from the differentia] display were 
deed differentially expressed. Duplicate dotblots 
the candidate clones were blotted with the 
splay from which they were originally isolated 
id with the 'reverse subtraction' display. To 
ake the reverse-subtracted probe, the subtractive 
bridisation step of the procedure was carried 
n using the original tester cDNA as a driver, 
d the original driver cDNA as a tester. In this 
<y, clones that are false positives can be iden- 
ed through their presence in both blots. Such 
se positives most commonly arise through hav- 
; a very high abundance in the initial sample or 
usual hybridisation properties (Li et al., 1994) 



Although the SSH method itself has been 
shown to be efficient, and despite the screening 
step that we included, there is an important caveat 
to bear in mind — namely that it is important 
that all clones be considered only as 'candidates' 
until the actual abundance of their mRNA is 
quantitated in treated and control samples. To- 
wards this end, we examined the expression of a 
limned number of clones using semi-quantitative 
RT-PCR. Albumin was used as the reference eene 
as we have previously found that the expression 
of this gene does not appear to chanee with the 
treatment regime that we used (Fie. 4, and data 
not shown). There are a number "of interesting 
points to note from our results. The first is the 
presence of genes that serve as appropriate posi- 
tive controls in the upregulated and downregu- 
lated series. For example, in the rat it can be seen 
that CYP4AI expression increases 14-fold follow- 
ing treatment. Although CYP4AI mRNA expres- 
sion levels following WY- 14,643 treatment have 
not been previously reported in this model, the 
figure compares favourably with that recorded by 
Bell et al. (1991), who used RNAse-protection to 
quantitate CYP4A1 in rat liver following treat- 
ment with methylclofenapate, another PP. In ad- 
dition, we also confirmed that the peroxisomal 
enoyl-CoA:hydratase-3-hydroxyacyl-CoA Afunc- 
tional enzyme is also upregulated 9-fold, in agree- 
ment with the findings of Chen and Crane (1992). 

A number of genes were downregulated follow- 
ing Wy-14,643 exposure, including CYP2C11 ex- 
pression. Corton et al. (1997) reported similar 
findings and suggested that this may in part ex- 
plain why male rats exposed to Wy-14,643 and 
some other PPs have high serum estradiol levels, 
as estradiol is a substrate for CYP2C1 1. We have 
also shown that the expression of contrapsin-like 
protease inhibitor (CLPI) was downregulated by 
Wy-14,643. This has not previously been reported 
and we suggest that it may be linked to a require- 
ment for increased availability of amino acids to 
accommodate the hepatomegaly induced by treat- 
ment. Although little is known of the function of 
parathymosin-cc, (zinc 2 + -binding protein) it has 
been shown to interact with the globular domain 
of histone HI, suggesting a role in histone func- 
tion (Kondili et al., 1996). In contrast to the 
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Albumin 
Transferrin 



Albumin 
Transferrin 




Fig. 4. Semi-quantitative RT-PCR experiment showing relative decrease in expression of transferrin in treated rat liver (RIM to 
RT-4) compared to controls (ROl to RC-3). An equal amount of mRNA was used in each reaction (10 „g), a „ ea c sample w s 
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iownregulation observed in this work, other stud- 
es have shown that parathymosin-a expression is 
elevated in breast cancer (Tsitsilonis et ah, 1993, 
998), with the implication that parathymosin-a 
nay somehow be involved in regulating cell pro- 
iferation by more than one mechanism. Transfer- 
in has previously been shown to be 
iownregulated in rat liver by hypolipidemic PPs 
Hertz et al., 1996). It is therefore interesting to 
ote that we isolated a clone identified as transfer- 
in from the upregulated display profile. Since we 
onfirmed by RT-PCR that transferrin is in fact 
ownregulated in the rat (Fig. 4), we conclude 
lat transferrin was either a false positive or was 
icorrectly identified. It could also be that we 
ave isolated a close relative, splice variant or 
ofonr of transferrin, which demonstrates a dif- 
rent expression profile under these experimental 
editions. Further investigations are therefore 



required to determine which of these possibilities 
are correct. 

One of our most intriguing observations was 
that one gene, CD81, appeared to be upregulated 
in rat liver but downregulated in guinea pig liver 
following Wy- 14,643 exposure. CD81 is a widely 
expressed cell surface protein that is involved in a 
large number of cellular functions, including ad- 
hesion, activation, proliferation and differentia- 
tion (reviewed by Levy et al., 1998). Since all of 
these functions are altered to some extent in car- 
cinogenesis, it is perhaps an important observa- 
tion that CD81 expression is differentially 
regulated in a resistant and sensitive species ex- 
posed to a non-genotoxic carcinogen. 

Albumin and ribosomal genes appear common 
to all differential displays and are thus undesir- 
able false positives. However, due to their high 
expression in the liver, they are difficult to re- 
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Tiove. We also noted a number of gene species, 
oarticularly in the guinea pig, which were com- 
mon to both upregulated and downregulated 
profiles. Again, the most likely reason for these 
laving arisen is their high abundance. 

A relatively large number of upregulated and 
iownregulated genes were isolated from guinea 
)ig liver following Wy-14,643 exposure. However, 
he guinea pig genome has been relatively poorly 
haracterised and so many of the clones were 
dentified as resembling genes or ESTs from other 
pedes. Without full-length sequence data it is 
ifficult to ascertain the accuracy of the assigned 
ientities and this must be borne in mind when 
tilising data such as this, for example, in design- 
tg effective primers for RT-PCR studies. AI- 
lough the actual isolated clone sequences can be 
>ed to do this, their relatively small size often 
:stricts the ability to design effective primers. In 
idition, as we observed with transferrin, using a 
iblished full-length sequence may help to iden- 
y false positives. 

ble 6 

mi-quantitative RT-PCR analysis of selected gene species in the rat and guinea pig- 



By comparing the expression profiles of genes 
showing altered expression in a PP-sensitive spe- 
cies (rat) with a PP-resistant species (guinea pig) 
it was our aim to identify genes that are mecha- 
mstically relevant to the non-genotoxic hepatocar- 
cmogenic action of Wy-14,643. However, few of 
the genes that we have isolated were common to 
both the rat and the guinea pig. This suggests 
either that the molecular mechanisms of response 
in these two species are so different that few genes 
are commonly regulated in response to Wy-14,643 
exposure, or that we have recovered only a small 
proportion of those genes that have altered ex- 
pression. The latter seems the more likely scenario 
since it is perceived that one of the main problems 
of subtractive hybridisation and other differential 
expression technologies is the inability to consis- 
tently isolate rare gene transcripts (Bertioli et aL, 
1995). This is potentially problematic in that 
weakly expressed genes may play an important 
role m regulating key cellular processes, and that 
the majority of mRNA species are classified as 



Putative change of expression following 
treatment according to dotblot 



Change according to RT-PCR 
quantitation 



)umm 
unctional enzyme 
P2C1 1 

P4A1 
alasc 

81 (TAPA-1) 



urapsin-like protease inhibitor 

athymosin-a (zinc 2 * binding 

rotein) 

asferrin 

P-GIucuronosyl transferase 

/nUnknown-1 
^-glycoprotein 



Rat 

N/A 
Up 
Down 

Up 

N/A 

Up 

Down 
Down 
Up 
Down 



Guinea pig Rat 



Guinea pig 



N/A 
N/A 
N/A 

N/A 

Up 

Down 

N/A 
N/A 
N/A 
N/A 



No change No change 

Upregulated* (9 x ) N/O 

Downregulated* N/D 
(Abolished) 

Upregulated* (14 x) N/D 
No change 

N/O 



N/O 

Upregulated**(1.4 



x ) 
N/D 



Downregulated** 
(0.5 x) 

Downregulated** N/D 

(0.6 x) 

Downregulated* No change 
(0.5 x ) 

Downregulated** N/O 
(0.2 x) 

No change {P = 0.06) N/D 

No change N/O 
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/rare 1 in abundance (Bertioli et ai., 1995). How- 
_ever L in thejr joriginal pager describing the SSH 
technique, Gurskaya et al. (1996) demonstrated 
that SSH can enrich rare molecules between 1000- 
• and 5000-fold in a single round of hybridisation. 
Unfortunately, due to high background smearing 
in our initial experiments (which hindered identifi- 
cation of single bands), we were compelled to 
reduce the primary hybridisation time to only 4 h 
— a step that theoretically is likely to reduce the 
number of rare sequences (CLONTECHniques, 
1996). Furthermore, it has been claimed by the 
manufacturers that, whilst this technique can 
identify changes as small as 1.5-fold between the 
driver and tester populations, it is best suited to 
the isolation of genes that show a greater than 
5-fold increase (CLONTECHniques, 1996). In ad- 
dition, where tester and driver contain genes with 
large and small differences in abundance, the SSH 
method will be biased towards identifying those 
genes with the large differences (CLONTECH- 
niques, 1996). Thus, it is most probable that we 
have not isolated all of the more rarely expressed 
transcripts and those demonstrating small changes 
in expression. 

One problem that remains is identifying the 
function of genes isolated in SSH experiments as 
described herein, some of which may be crucial to 
the process of carcinogenesis, and are, to date, 
unidentified. However, we have provided evidence 
herein that SSH can be used to begin the process 
of characterising the extent and importance of 
altered gene expression in response to a chemical 
stimulus. The developments of this approach 
should include characterisation of temporal and 
dose responses, and functional analysis studies 
including knockout mice. In combination, such 
studies should make a significant contribution to 
our understanding of the molecular mechanisms 
of action and physiological relevance of gene reg- 
ulation in non-genotoxic hepatocarcinogenesis. It 
should then be. possible to ascertain whether dif- 
ferentially expressed genes are causally or casually 
related to the chemical-induced toxicity, and 
therefore a substantial mechanistic advance. 

It is clear that there are also broader applica- 
tions for this experimental approach that go be- 
yond understanding the molecular mechanisms of 



peroxisome-proliferator induced non-genotoxic 
. hepatocarqinogenesis in rodents. The potential 
medical and therapeutic_henefits of elucidating the 
molecular changes that occur in any given cell in 
progressing from the normal to the carcinogenic 
(or other diseased, abnormal or developmental) 
state are very substantial. Notwithstanding the 
lack of complete functional identification of al- 
tered gene expression, such gene profiling studies 
described herein essentially provides a 'fingerprint' 
of each stage of carcinogenesis, and should help in 
the elucidation of specific and sensitive biomark- 
ers for different types of cancer. Amongst other 
benefits, such fingerprints and biomarkers could 
help uncover differences in histologically identical 
cancers, and provide diagnostic tests for the earli- 
est stages of neoplasia. In addition, the genes 
identified by this approach may be incorporated 
into gene-chip DNA-arrays, thus providing a 
standard genetic fingerprint for a particular toxin 
treatment in a particular species. Interrogation of 
these gene arrays for an unknown compound that 
has a similar pattern to the known reference 
chemical would then provide evidence that the 
unknown may have a toxicity profile similar to 
the 'standard' fingerprint, thereby serving as a 
mechanistically relevant platform for further de- 
tailed investigations. 
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ABSTRACT We have developed high-density DNA mi- 
croarrays of yeast ORFs. These microarrays can monitor 
hybridization to ORFs for applications such as quantitative 
differential gene expression analysis and screening for se- 
quence polymorphisms. Automated scripts retrieved sequence 
information from public databases to locate predicted ORFs 
and select appropriate primers for amplification. The primers 
were used to amplify yeast ORFs in 96-well plates, and the 
resulting products were arrayed using an automated micro 
arraying device. Arrays containing up to 2,479 yeast ORFs 
were printed on a single slide. The hybridization of fluores- 
centry labeled samples to the array were detected and quan- 
titated with a laser confocal scanning microscope. Applica- 
tions of the microarrays are shown for genetic and gene 
expression analysis at the whole genome level. 



The genome sequencing projects have generated and will con- 
tinue t generate enormous amounts of sequence data. The 
genomes f Saccharomyces cerevisiae, Haemophilus influenzae (1), 
Mycoplasma geniialium (2), and Meihanococcus jannischii (3) 
have been completely sequenced. Other model organisms have 
had substantial portions of their genomes sequenced as well 
including the nematode Caenorhabditis clegans (4) and the small 
flowering plant Arabidopsis thaliana (5). Given this ever- 
increasing amount of sequence information, new strategies are 
necessary to efficiently pursue the next phase of the genome 
projects--the elucidation of gene expression patterns and gene 
product function on a whole genome scale. 

One important use of genome sequence data is to attempt 
t identify the functions of predicted ORFs within the genome. 
Many f the ORFs identified in the yeast genome sequence 
were not identified in decades of genetic studies and have no 
significant homology to previously identified sequences in the 
database. In addition, even in cases where ORFs have signif- 
icant homology to sequences in the database, or have known 
sequence motifs (e.g., protein kinase), this is not sufficient to 
determine the actual biological role of the gene product. 
Experimental analysis must be performed to thoroughly un- 
derstand the biological function of a given ORFs product. 
Model organisms, such as S. cenrvisiae, will be extremely 
important in improving our understanding of other more 
c mplex and less manipulate organisms. 

To examine in detail the functional role of individual ORFs and 
relationships between genes at the expression level, this work 
describes the use of genome sequence information to study large 
numbers f genes efficiently and systematically. The procedure 
was as f Dows. (i) Software scripts scanned annotated sequence 
informati n from public databases for predicted ORFs. (if) The 
stan and st p position of each identified ORF was extracted 
aut matically, along with the sequence data f the ORF and 200 
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bases flanking either side, (iti) These data were used to automat- 
ically select PGR primers that would amplify the ORF. (rv) The 
primer sequences were automatically input into the automated 
multiplex oligonucleotide synthesizer (6). (v) The oligonucleo- 
tides were synthesized in 96- well format, and (vi) used in 96-well 
format to amplify the desired ORFs from a genomic DNA 
template, (vu) The products were arrayed using a high-density 
DNA arrayer (7-10). The gene arrays can be used for hybridiza- 
tion with a variety of labeled products such as cDNA for gene 
expression analysis or genomic DNA for strain comparisons, and 
genomic rnismatch scanning purified DNA for genotyping (11). 

METHODS 

Script Design. All scripts were written in UNIX Tool Command 
Language. Annotated sequence information from GenBank was 
extracted into one file containing the complete nucleotide se- 
quence of a single chromosome. A second file contained the 
assigned ORF name followed by the start and stop positions of that 
ORF. The actual sequence contained within the specified range, 
along with 200 bases of sequence flanking both sides, was extracted 
and input into the primer selection program primer 0l5 (White- 
head Institute, Boston). Primers were designed so as to allow 
amplification of entire ORFs. The selected primer sequences were 
read by the 96-weIl automated multiplex oligonucleotide synthe- 
sizer instrument for primer synthesis. The forward and reverse 
primers were synthesized in two separate 96-well plates in corre- 
sponding wells. All primers were synthesized on a 20-nmol scale. 

ORF Amplification and Purification. Genomic DNA was iso- 
lated as described (12) and used as template for the amplification 
reactions. Each PCR was done in a total volume of 100 ul. A total 
of 0.2 uM each of forward and reverse primers were aliquoted into 
a 96-well PCR plate (Robbins Scientific, Sunnyvale, CA); a master 
mix containing 0.24 mM each dNTP, 10 mM Tris (pH 8.5), 50 mM 
MgCl 2 , 25 units Too polymerase, and 10 ng of template was added 
to the primers, and the entire mix was thermal cycled for 30 cycles 
as follows: 15 min at 94°C 15 min at 54°C and 30 min at 72°C 
Products were ethanol precipitated in polystyrene v-bottom 96- 
well plates (Costar). All samples were dried and stored at -20°C 

Arraying Procedure and Processing, Microarrays were 
made as described (8). 

A custom built arraying robot was used to print batches of 48 
slides. The robot utilizes four printing tips which simultaneously 
pick up —1 ^1 of solution from 96-well microtiter plates. After 
printing, the microarrays were rehydrated for 30 sec in a humid 
chamber and then snap dried for 2 sec on a hot plate (1CXPC). The 
DNA was then UV crosslinked to the surface by subjecting the 
slides to 60 millijoules of energy. The rest of the pory-L-rysine 
surface was blocked by a 15 -min incubation in a solution f 70 mM 
succinic anhydride dissolved in a solution consisting of 315 ml of 
l-methyl-2-pyrrolidinone (Aldrich) and 35 ml f 1 M boric acid 
(pH 8.0). Directly after the blocking reaction, the bound DNA 
was denatured by a 2-min incubation in distilled water at -95°C 
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Fig. 1. Two-color fluorescent scan of a yeast microarrav enmo,« 
«f 1479 eiemcnu (ORFs). The eemer^uTta^ £w een 
elements is 345 jim. A probe mixture consisting of cDNA from veasi 
extract/peptone (YEP) galactose (green pseudocolor) and I toT 
cose (red pseudocolor) grown yeast cultures was hybridized to the 
array. Intensity per element corresponds to ORF expression and 
pseudocolor per element corresponds to relative ORF expression 
between the two cultures. expression 

The slides were then transferred into a bath of 100% ethanol at 
room temperature. 

. ^ Preparation: cDNA. Yeast cultures (100 ml) were grown 
t -l ODamo and total RNA was isolated as described (13) Ud 
t 500 jig total RNA was used to isolate mRNA (Qiagen 
Chatsworth, CA). Oligo(dT)20 (5 M g) was added and annealed to 
« °{_«RNA by heating the reaction to 70°C for 10 min and 
quick chilling on ice, plus 2 fd Superscript II (200 units/ul) (Life 
Technologies, Gaithersburg, MD), 0.6 ^ 50x dNTP n£ (final 
concentrations were 500 mM dATP, OCT?, dGTP, and 200 Ja 

Cys-dUTP (Araersham). Reactions were carried out at 42»C for 
h ^ *• mRNA degraded by the addition of 0 3 
£5 M NaOHand 0.3 pj 100 mM EDTA andheating to 65? for 
10 nun. The sample was then diluted to 500 ul with TE and 
concentrated using a Microcon-30 (Amicon) to 10 ul 

Probe Preparation: Genomic DNA. Fluorescent DNA was 
prepared from total genomic DNA as follows: 1 ue of random 
nonamer ligonucleotides was added to 2.5 M g of genomic 
DNA^ The mixture was boiled for 2 min and then chilled™ 
ice. A reaction mixture containing dNTPs (25 uM dATP 
dCTP, dGTP, 10 mM dTTP. and 40 mM Cy3^dUTP or 
Cy5-dUTP) reaction buffer (New England Biolabs), and 20 
units ex nuclease free Klenow enzyme (United States Bio- 
chemical)was added, and the reaction was incubated at 3TC 
for 2 h. The sample was then diluted to 500 ul with TE and 
concentrated using a Microcon-30 (Amicon) to 10 ul 

,"£ T 't™£ a - ***** tabe,ed P«>be was resuspended in 1 1 
pJof35x SSC containing 10 ng Escherichia coli tRNA, and 03% 
SDS The sample was then heated for 2 min in boiling water 
cooled rapidly 1 room temperature, and applied to the array The 
amy was pbeed m a sealed, humidified, hybridization chamber 
Hybndizauon was earned out f r 10 h in a 62^ water bath, after 
which the arrays were washed immediately in 2x SSC/0 2% SDS 
A second wash was performed in 0.1 XSSC ' 
<, fl A „«!!?"f and , 1 Q u » n,l «»« ion -. Arrays were scanned on a 
scanning laser flu rescence m.cr scope devel ped by Steve 
Smith with software written by N am Ziv (Stanf rd Univer. 
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sity). A separate scan was donefbr each of the two fl u ««». 

£u„d- "li 1 ? 6 T geS ^ K *<" SiSd for^lA 
bounding box, fined to the size of the DNA spots, washed 
over each array element. The average Huofescent intetJrywS 
calculated by summing the intensities of each pixel presSnTS 
a bounding box and then dividing by the total number of pixels. 
Loca area background was calculated for each array element 
by dimming the average fluorescent intensity at the edge of 
the boundmg box. To normalize for fluorophore-specfficSari- 
ation control spots containing yeast genomic DNA were 
applied to each quadrant during the arraying process 
elements were quantitated and the ratios of the signals were 
2h^T, d r I1,eSC rati0S were *** uscd "> normalize the 
f f,n °r Pl,e f r w nSUn " ty xttiD & such ratios of the 

fluorescence of the genomic DNA spots were close to a value 
of 1.0. The average signal intensity at any given spot was 
regarded as significant if i, was at least two standard dertationl 

ST^Sa^t eXPerimei " w COnduncd in du P«- 
w£rfS the t . fluo, °P , »w representing each channel re- 
versed. The ratios presented here are the average of the two 
experiments, except in the case in which the signal for the 
"V queS,,0n wa$ be,ow reliability threshold The 
reliability threshold also determined the dynl" ^gToftbe 
experiment. For all of the experiments presented, Average 

ce^c?^ nge k*" 1 2° 10 °- In * e wbe « *e fhSSE 
cence from a very bnght spot saturates the detector, differ- 
ed rmos A w J a 'L U 6Cnera1 ' * ""^estimated. This can be 
compensated for by scanning at a lower overall sensitivity. 

RESULTS 

The accumulation of sequence information from model organ- 
S^*! ™ e ™ lmous opportunity and challenge to under- 
stand the biological function of many previously ^characterized 
genes. To do this accurately and efficiently, a directed strateirv 
was developed that enables the monitoring of nSSteSS 

fo™ a , m , bc attachcd ,0 a surface in a high-density 
format (8). In practice, it is possible to array over 6 000 elements 

With this capability and the availability of the entire se- 

a q „ U nr °K f r he yCaSt 8en ° me ' our strate «y w « » use a d rented 
approach for generating the complete genome array This 
procedure .nvolved synthesizing a ? pair of oligon^cleoSe 
primers to amplify each ORF. The PCR produl comafn ng 

example, as probe for monitoring gene expression leveb bv 
hybndizmg to the array labeled cDNA generated f romLtted 

wir « U " der a " y «P eri m«tal condition. 

Primer Selection and Synthesis. The primer selection was fuUv 

0^ head? "p 00 ' C ° mmand **** and P " S 

1 „ I d) - Pnmer P 3 " 5 were automatically seleaed suc- 

c«sf u Uyf ? >99%oftheORFstested.Primer S e q uence^nth u ; 
be selected rapidly with minima] manual processing aIohS 
set of forward and reverse primers were selected infuiyTS 
ORF on chromosomes I, II, III, V, VI, VIII, IX, Xj and W 
Primers for a representative set of ORFs (15% coverage) were' 
chosen for the remaining chromosomes. With the release of Se 
IT£?J™T SCqUena ' thC C ° m P ,Cte - - P-- has 

a ° RF ? qu , ires 8 mi * w ^ W of synthetic primers, 

^S hT"°!?' 12,200 ^^udeotides wU) be Luired . 
to mdrv,dually ampbfy each target. This costly component w« 
addressed with the automated multiplex oligonucleotide syntS 
T^Z hKh cffi h cient,y Resizes primers in a 96-weU Sal 
Each pnrner synthesaed on a 20-nmol scale, provides enoueh 
matenal for 100 amplification reactions, whereas a (Sen PCR 
produa provides enough material to generate an element 
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Table 1. Heat shock vs. control expression data 

Ratio of 
gene expression 
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Control 



23 



1.0 
1.1 
1.9 

2.6 
3.3 
33 
3.9 



3.6 

US 

2.6 

US 

2.7 

ZS 

U& 

2.8 

2.8 

3.1 

3.3 



Heat 

2J 
Z0 



36.0 
27.4 
6.7 
3.4 
2.6 
IS 
2.1 
1.7 
1.4 
1.4 



13 



Z4 
2.1 

Z3 



5.8 



ORF 

YLR142 
YOL140 
YGL148 
YFL014 

YBR072 

YBR054 

YCR021 

YER103 

YLR259 

YBR169 

YBL075 

YPL240 

YDR258 

YNL007 

YEL030 

YHR064 

YBL008 

YBL002 

YBL003 

YBR010 

YBR009 

YDR343 

YHR092 

YAR071 

YLR096 

YER102 

YBR181 

YCR031 

YLR441 

YHR141 

YBL072 

YHL015 

YBR191 

YLR340 

YGL123 

YLR194 



Gene 

PUT1 
ARG8 
AR02 
HSP12 
HSP26 
YR02 
HSP30 
SSA4 
HSP60 
SSE2 
SSA3 
HSP82 
HSP78 
SIS1 



HIR1 
HTB2 
HTA2 
HHT1 
HHF1 
HXT6 
HXT4 
PHOll 
KIN2 
RPS8B 
RPS101 
CRY1 
RP10A 
RPU1B 
RPS8A 
URP2 
URP1 
RPLAO 
SUP44 



Description 

Proline oxidase 
Acctylomhhine aminotransferase 
Chorisrnate synthase 
Heat shock protein 
Heat shock protein 

Similarity to HSP30 heat shock protein Yrolp 
Heat shock protein 
Heat shock protein 

Mitochondrial heat shock protein HSP60 
Heat shock protein of the HSP70 family 
Cytoplasmic heat shock protein 
Heat shock protein 

70-kDa heat shock protein 

Heat shock protein 

Histone transcription regulator 

Historic H2B.2 

Histone H2AJ 

Histone H3 

Histone H4 

High-affinity hexose transporter 

Moderate- to low-affinity glucose transporter 

Secreted acid phosphatase, 56 kDa isozyme 

Ser/Thr protein kinase 

Ribosomai protein S8.e 

Ribosomal protein S6.e 

40S ribosomal protein S14.e 

Ribosomal protein S3.a.c 

Ribosomal protein U6a.e 
Ribosomal protein S8.e 
Ribosomal protein 
Ribosomal protein L21.e 
Acidic Ribosomal protein LI Ox 
Ribosomal protein 
Hypothetical protein 



500-1,000 arrays. Thus, a single primer pair provides enoueh 
starting materia] for up to -50,000 arrays. 

Primers were synthesized to amplify yeast ORFs Primer 
synthesis had a failure rate of <1% in over 18 plates of 
synthesis as determined by standard trityl analysis (6) The 

SUC °^cf ? ° f I* PCR am P Hf5ca *™ «ing the primer pairs 
was 94% based on agarose gel analysis of each PCR. The 
purified PCR products were used to generate arrays. Two 
versi ns of the arrays were created for the experimental results 
presented here. The first array contained 2,287 elements and 
the second airay batch contained 2,479 elements. 

Genome Arrays, The amplified ORFs were arrayed onto class 
at a spacing of 345 microns (Fig. 1). The high-density spacing of 
DNA samples allows the hybridization volumes to be mini- 
injzed--volumes are a maximum of 10 /d. The labeled probe can 
thus be maintained at relatively high concentrations, making 1-2 
Mg of mRNA sufficient for analysis. This also obviates the need 
f r a subsequent amplification step and thus avoids the risk of 
altering the relative ratios f different cDNA species in the 
sample. r 

Genetic Analysis: Genomic Comparison of Unrelated Strains. 
Microarrays allow efficient comparison f the gen mes f dif- 
ferent strains. Genomic DNA from Y55, an 5. cereiisiae strain 
divergent fromlhe reference strain S288c, was randomly labeled 
with Cy3-dUTP and hybridized simultaneously with the S28fic 
DNA labeled with C^dUTP. When a ^ILSSS^ 
hybndizauon of the DNA from the two strains was done, several 



aci»^ r, Ch ^ ncI (data not shown )- These include SGE1 

Tn%l'™Z YCR105 ' ^ that the regiom 

containing these genes are extremely divergent, or all together 
deleted from the strain. Subsequent attempts to generate PCR 
products from SGEl, EN A 2, and ASP3A £g Y55 ^DNA faS 
This result supports the conclusion that these genes are likehyto 
ninng from the Y55 genome. It is interesting to note fcat £ 
least two of the reg.ons absent in the Y55 genome have been 
tobedeleted in mutant laboratorj 
strains (14-16) In particular, the Asp-3 region appears to be 
highly prone to being deleted (15, 16) 

^^ff UllS mdiCalt lh , at genc an be used to efficiently 
A m S i ra i n l 0f 3,1 or S anism for la ^ge deletion poly, 
morphisms. A single hybridization and scan will reveal differences 
based on differential hybridization to particular elements. It is 
reasonable to suppose that an equivalent number of genes are 

SS^h?-^ and abscnt in ** S288c 

result should be vjewed as a minimum estimate of the deleti n 
polymorphisms that exist between these two unrelated strains as 
intergenic deletions or small intragenic deletions would not be 
detected because considerable hybridizing material would be 
remain. Sequence polymorphisms, such as deletions, are present 
in populations of every species and must at some level affect 
phenotype. One of the challenges of the gen me era will be to 
cnticaDy examine sequence polymorphisms that exist in the 
natural gene pool relative t the reference gen me sequence. 
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Gene Expression Analysis. The arrays were used to examine 
gene expression in yeast grown under a variety of different 
conditions. Expression analysis is an ideal application of these 
arrays because a single hybridization provides quantitative expres- 

Table 2. Cold shock vs. control expression data 

Ratio of 
gene expression 



Fig. 2. ORF categories displaying dif- 
ferential expression between heat shocked 
and untreated cultures. Bars within cate- 
gories correspond to individual ORFs. 
Green shaded bars correspond to relative 
increases in ORF expression under 25*C 
growth conditions. Red shaded bars cor- 
respond to relative increases in ORF ex- 
pression under 39°C growth conditions. 

sion data for thousands of genes. To better understand results for 
genes of known function, ORFs were placed in biologically rele- 
vant categories on the basis of function (e.g. f amino acid catabolic 
genes) and/or pathways (e.g., the histidine biosynthesis pathway) 



Control Cold 


ORF 


Gene 


~ " 33 


YOR153 


PDR5 


2.4 


YCR012 


PGK1 


2.9 


YCL040 


GUC1 


1.4 


YHR064 




2.0 


YJL034 


KAR2 


2.1 


YDR258 


HSP78 


22 


YLU039 


UB14 


2.7 


YLL026 


HSP104 


3.1 


YER103 


SSA4 


33 


YBR126 


TPS1 


3.8 


YPL240 


HSP82 


7.9 


YBR054 


YR02 


7.9 


YBR072 


HSP26 


16^5 


YCR021 


HSP30 


1.8 


YDR343 


HXT6 


2.1 


YHR096 


HXT5 


2.4 


YFR053 


HXK1 


2.8 


YHR092 


HXT4 


3.4 


YHR094 


HXT1 


13 


YHR089 


GAR1 


1.7 


YLR048 


NAB1B 


1.7 


YLR441 


RP10A 


1.7 


YLL045 


RPL4B 


1.6 


YLR029 


RPL13A 


1.6 


YGL123 


SUP44 


3.1 


YBR067 


T1P1 


22 


YER011 


T1R1 


2.0 


YCR058 




4.2 


YKL102 





Description 

Plciotropic drug resistance protein 
Phosphoglycerate kinase 
Aldohexose specific glucokinasc 
Heat shock protein 
Nuclear fusion protein 

Mitochondrial heat shock protein of clpb family of ATP-dependent proteases 
Ubiquitin precursor F 
Heat shock protein 
Heat shock protein 

a, o-Trchalosc-phosphate synthase (UDP-forming) 
Heat shock protein 

Similarity to HSP30 heat shock protein Yrolp 

Heat shock protein 

Heat shock protein 

High -affinity hexose transporter 

Putative hexose transporter 

Hexokinasc I 

Moderate- to low-affinity glucose transporter 
Low-affinity hexose (glucose) transporter 
Nucleolar rRNA processing protein 
40S ribosomal protein p40 homolog b 
Ribosomal protein S3a.c 
Ribosomal protein L7a.e.B 
Ribosomal protein L15.e 
Ribosomal protein 

Cold- and heat-shock-induced protein of the Srpl/Tiplp family 
Cold-shock-induced protein of the Tirlp, Tiplp family 
Hypothetical protein 
Hypothetical protein 
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Table 3. Glucose vs. galactose expression data 

Ratio of 
gene expression 



UIUCOK 


uuaciose 


uivr 


uene 


2.1 




YHR018 


ARG4 


3.5 




YPR035 


GLN1 


ZB 




YML116 


ATR1 


10 




YMR303 


ADH2 


3.7 




YBR145 


ADH5 




32 


YBL030 


AAC2 




2.9 


YBR085 


AAC3 




2.7 


YDR298 


ATP5 




US 


YBR039 


ATP3 




S3 


YML054 


CYB2 




3.4 


YMLQ54 


CYB2 




2J 


YKL150 


MCR1 




42 


YBL045 


COR1 




3.5 


YDL067 


COX9 




2.7 


YLR038 


COX12 




2.6 


YHRQ51 


COX6 




2.4 


YLR395 


COX8 




2J 


YFR033 


OCR6 




23.7 


YLR081 


GAL2 




21.9 


YBR018 


GAL7 




21.8 


YBR020 


GAL1 




19.5 


YBR019 


GAL10 




14.7 


YLR081 


GAL2 




8.6 


YDR009 


GAL3 




3.0 


YML051 


GAL80(1) 




2.8 


YML051 


GAL80(2) 


Z7 




YER055 


HIS1 


3.4 




YBR248 


HIS7 


7.4 




YCL030 


HIS4 


5.8 




YKR080 


MTD1 


6.0 




YDR019 


GCV1 


6.1 




YLR058 


SHM2 




8.1 


YML123 


PH084 


3.5 




YDR408 


ADES 


3.6 




YDR408 


ADE8 


4.4 




YAR015 


ADE1 


5.6 




YMR300 


ADE4 


5.6 




YOR128 


ADE2 


6.0 




YGL234 


ADE5,7 




63 


YBL015 


ACH1 



Description 



Arginosuccinate lyase 
Glutaxnate-ammonia ligase 

Aminotriazole and 4-nitroquinoline resistance protein 

Alcohol dehydrogenase II 

Alcohol dehydrogenase V 

ADP t ATP carrier protein 2 

ADP, ATP carrier protein 

H ""-transporting ATP synthase 6 chain precursor 

H ""-transporting ATP synthase 7 chain precursor 

Lactate dehydrogenase cytochrome b2 

Lactate dehydrogenase cytochrome 62 

Cytochrome-M reductase 

Ubiquinol-cytochrome c reductase 44K core protein 

Cytochrome c oxidase chain VIIA 

Cytochrome c oxidase, subunit VIB 

Cytochrome c oxidase subunit VI 

Cytochrome c oxidase chain VIII 

Ubiquinol-cytochrome c reductase 17K protein 

Galactose (and glucose) permease 

UDP-giucose-hexose-1 -phosphate uridylyltransfcrase 

Galactokinase 

UDP-giucose 4-cpimerase 

Galactose (and glucose) permease 

Galactokinase 

Negative regulator for expression of galactose-induced genes 
Negative regulator for expression of galactose-induced genes 
ATP phosphoribosyltransferase 
Glutamine amidotransferase/cydase 

Phosphoribosyl-AMP cyclohydrolasc/phosphoribosyl-ATP pyrophosphatasc/histidinol 
dehydrogenase 

Methylcnetetrahydrofolate dehydrogenase (NAD+) 
Glycine decarboxylase T subunit 
Serine hydroxymethyltransferase 
High-affinity inorganic phosphate/H* symporter 
Phospboribosylglycinamide formyltransferase (GART) 
Phosphoribosylglycinamidc formyltransferase (GART) 
Phosphoribosylamidoimidazolc-succinocarboxamidc synthase 
Arriidophosphoribosyltransferasc 
Phosphoribosylaminoimidazole carboxylase 

Phosphoribosylamine-glycine Hgasc and phosphoribosylformylglycinamidinc cydo-littse 
AcetyNCoA hydrolase * 



Heat Shock Results. A log phase culture growing in YEP/ 
dextrose medium at 25°C was split in half. One half of the 
culture remained at 25°C whereas the other half of the culture 
was shifted t 39°C mRNA was isolated from both cultures 1 h 
after heat shock for comparison on microarrays and, although 
this time point is not optimal for measuring induction of heat 
shock mRNAs (17), many known heat shock genes exhibited 
considerable induction at this time point (Table 1; Fig. 2). 
Down-regulation of genes in the ribosomal protein and histone 
gene categ ries was also observed. Differential expression 
between the heat-shocked culture and the control was also 
observed for many other genes. Genes in many categories, such 
as amin acid cataboiism and amino acid synthesis, exhibited 
a mixed response with some genes sh wing little or no 
differential expression and other genes sh wing a significant 
increase or decrease in gene expression in response to heat 
shock (Table 1; Fig. 2). 

Cold Shock Results. A log phase culture growing in YEP/ 
dextrose medium at 37°C was split in half. One half of the 
culture remained at 37°C while the therhalf f the culture was 
shifted to 18°C. mRNA was is lated from both cultures 1 h 
after cold shock for comparison n micr arrays. As expected, 



two known cold shock genes (TIP1, TIR1) were expressed at 
a significantly higher level in the cold-shocked culture. Genes 
in other functional categories, such as glucose metabolism and 
heat shock displayed a mixed response with expression of some 
genes being unaffected and other genes exhibiting significant 
up- or down-regulation in response to cold shock (Table 2). 

Steady-State Galactose vs. Glucose Results. mRNA was 
isolated from steady-state log phase YEP galactose and YEP 
glucose grown cultures for comparison on the microarrays. As 
expected, the GAL genes were expressed at a much higher 
level in the galactose culture. Many genes were differentially 
expressed in these cultures that were not a priori expected to 
exhibit differential expression. For example, some genes in the 
amino acid catabolic category were up-regulated in the galac- 
tose culture whereas genes in the one-carbon metabolism and 
purine categories were largely or entirely down-regulated in 
the galactose culture (Table 3). Genes in other categories, such 
as amino acid synthesis, abc transporter, cytochrome c, and 
cytochrome 6, exhibited mixed responses; some genes in a 
category showed little or no obvious differential expressi n 
whereas other genes in the same category showed significant 
differential expression in the galactose and glucose cultures. 
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DISCUSSION 
The results of these experiments show that many genes are 
differentially expressed under the three environmental condi- 
tions described here. The expected and predicted changes in eene 
expression, such as HSP12 in the heat-shocked culture, TIP1 in 
the cold-shocked culture, and GAL2 in the steady-state galactose 
culture, were observed in every case. However, in addition to the 
expected changes in gene expression, significant differential 
expression was also observed for many other genes that would 
not. a priori, be expected to be differentially expressed For 
S?2?^^Ef Ssion 0f PHO " decrea sed and expression of 
YLR194, KIN2, and HXT6 increased in the heat shocked culture. 
Expression of MST1 and APE3 decreased and expression of 
PDR5 and GAR1 increased in the cold-shocked culture In 
addition, ADE4 and SER2 were expressed at reduced levels 
whereas PH084 and ACH1 were expressed at higher levels in 
celb grown in galactose compared with cells grown in glucose 
Differentia] expression of these and many other genes was specific 
t one f these three environmental conditions. 

Many ther genes were found to be differentially expressed 
under more than one condition. When differentially expressed 
genes in cold- and beat-shocked cultures were compared, 30 
genes were found in common. Of these 30 genes, 28 showed 
inverse expression (U, increased expression under one condition 

v™S^^75?°? und ! r ** other condition). Two genes, 
YCRQ58 and YKL102, showed elevated expression in response m 
bodi cold and heat shock. Fifteen genes were found to be 
differentially expressed in both the heat-shocked and steady-state 
galactose cultures: 9 genes showed increased expression and 5 
showed decreased expression under both conditions. Twenty 
genes were differentially expressed in both the cold-shocked and 
steady-state galactose cultures: 8 genes showed decreased expres- 
sion and 15 genes showed increased expression under both con- 
ditions. Six genes showed increased expression in the galactose 
culture and decreased expression in the cold shocked culture 
One gene (ODP1) showed increased expression in both the 
cold-shocked and steady-state galactose cultures. 

Gene expression is affected in a global fashion when environ- 
mental conditions are changed and both expected and unex- 

Pea ?l 8enes . are affectcd There * ^erlap in the genes that 
are differentially expressed under quite different environmental 
c nditions. These results can be rationalized by considering the 
nigh degree of cross-pathway regulation in yeast For example, 
there is evidence for cross-pathway regulation between (i) carbon 
and nitrogen metabolism (18), (it) phosphate and sulfate metab- 
olism (19) and («) purine, phosphate, and amino acid metabo- 
lism (20-24). There are also examples of the interaction of 
general and specific transcription factors (25, 26). Finally, within 
the broad class of amino acid biosymhetic genes, there is evidence 
for amino Bad specific regulation of some genes, regulation via 
genera] control for other genes, and regulation via both specific 
and general control for other genes (22, 27-30). 

Cross-pathway regulation arises from the complex structure 
ot promoters. Virtually all promoters contain sites for multiple 
transcription factors and, therefore, virtually all genes are 
subject t c mbinatorial regulation. For example, the HIS4 
prom ter contains binding sites for GCN4 (the general amino 
and c ntrol transcription factor), PH02/BAS2 (a transcrip- 
tional regulator of phosphatase and purine biosymhetic 
genes), and BAS1 (a transcriptional regulator of purine bio- 
synthetic genes) (31). It is likely that the complex effects on 
gene expression described in this w rk are a direct conse- 
qu £P ce I me combinatorial regulati n of gene expression 

These findings illustrate the power of the highly parallel whole 
genome approach when examining gene expression. The global 
effects of environmental change on gene expression can now be 
t^T^-, 11 ? clear ^termining the mechanics) 
and the functional role of the dramatic global effects on gene 



28. 



33. 



_ NatL Acad - SeL USA 94 (1997) 

expnession in different environmehB will be a significant coal 
lenge. The era of whole genome analysts wiH, ultimately aUow 
researchers to switch from the very focused single graeTScrooto 
view of gene expression and instead view the Si more is alam 
complex network of gene regulatory pathways. 

With the entire sequence of this model organism known, new 
Sf° ha ^ dcv fPcd that allow for genornVwide 
analyses (32, 33) of gene function. The genome mkroarravs 
represent a novel tool for genetic and expression analysis rf the 
yeast genome This pilot study uses arrays containing >35% of 
£ y ™ 0RFs ™« clear mat the entire set of ORFVfrom 
the yeast genome can be arrayed using the directed primer based 

wuj auow all 6,100 ORFs to be arrayed in an area of less than 1J8 
S'JowSh^* 6 , r hn0bgy im P rov «. d«ectk,n limis 
If 500 118 ° f ^ to be used 

^Ju^ a " iyS ?™' tdc for a robust - fully automated 
H '^ examining genome structure and gene rune- 
uon. They allow for comparisons between different genomes 

ISLT? W t" he,p ,0 e,ucida,e "'ationshijs betS 
genes and allow the researcher to understand gene fonct onS 
understandmg expression patterns across me yeasfgenSne^ 
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Exploring the Metabolic and Genetic Control of 
Gene Expression on a Genomic Scale 

Joseph L DeRisi, Vishwanath R. Iyer, Patrick O. Brown* 

DNA microarrays containing virtually every gene of Saccharomyces cerevisiae were used 
to cany out a comprehensive investigation of the temporal program of gene expression 
accompanying the metabolic shift from fermentation to respiration. The expression 
profit s observed for genes with known metabolic functions pointed to features of the 
metabolic reprogramming that occur during the diauxic shift, and the expression patterns 
of many previously uncharacterized genes provided dues to their possible functions The 
same DNA microarrays were also used to identify genes whose expression was affected 
by deletion of the transcriptional co-repressor 7UP7 or overexpression of the transcrip- 
tional activator YAP1. These results demonstrate the feasibility and utility of this ap- 
proach to genomewide exploration of gene expression patterns 



The complete sequences of nearly a dozen 
microbial genomes are known, and in the 
next several years we expect to know the 
complete genome sequences of several 
metazoans, including the human genome. 
Defining the role of each gene in these 
genomes will be a formidable task, and un- 
derstanding how the genome functions as a 
whole in the complex natural history of a 
living organism presents an even greater 
challenge. 

Knowing when and where a gene is 
expressed often provides a strong clue as to 
its biological role. Conversely, the pattern 
of genes expressed in a cell can provide 
detailed information about its state. Al- 
though regulation of protein abundance in 
a cell is by no means accomplished solely 
by regulation of mRNA, virtually all dif- 
ferences in cell type or state are correlated 
with changes in the mRNA levels of many 
genes. This is fortuitous because the only 
specific reagent required to measure the 
abundance of the mRNA for a specific 
gene is a cDNA sequence. DNA microar- 
rays, consisting of thousands of individual 
gene sequences printed in a high-density 
array on a glass microscope slide (I, 2), 
provide a practical and economical tool 
for studying gene expression on a very 
large scale (3-6). 

Saccharomyces cerevisiae is an especially 
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favorable organism in which to conduct a 
systematic investigation of gene expression. 
The genes are easy to recognize in the ge- 
nome sequence, cis regulatory elements are 
generally compact and close to the tran- 
scription units, much is already known 
about its genetic regulatory mechanisms, 
and a powerful set of tools is available for its 
analysis. 

A recurring cycle in the natural history 
of yeast involves a shift from anaerobic 
(fermentation) to aerobic (respiration) me- 
tabolism. Inoculation of yeast into a medi- 
um rich in sugar is followed by rapid growth 
fueled by fermentation, with the production 
of ethanol. When the fermentable sugar is 
exhausted, the yeast cells turn to ethanol as 
a carbon source for aerobic growth. This 
switch from anaerobic growth to aerobic 
respiration upon depletion of glucose, re- 
ferred to as the diauxic shift, is correlated 
with widespread changes in the expression 
of genes involved in fundamental cellular 
processes such as carbon metabolism, pro- 
tein synthesis, and carbohydrate storage 
(7). We used DNA microarrays to charac- 
terize the changes in gene expression that 
take place during this process for nearly the 
entire genome, and to investigate the ge- 
netic circuitry that regulates and executes 
this program. 

Yeast open reading frames (ORFs) were 
amplified by the polymerase chain reaction 
(PCR), with a commercially available set of 
primer pairs (8). DNA microarrays, con- 
taining approximately 6400 distinct DNA 
sequences, were printed onto glass slides by 
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using a simple robotic printing device (9). 
Cells from an exponentially growing culture 
of yeast were inoculated into fresh medium 
and grown at 30°C for 21 hours. After an 
initial 9 hours of growth, samples were har- 
vested at seven successive 2-hour intervals, 
and mRNA was isolated (10). Fluorescendy 
labeled cDNA was prepared by reverse tran- 
scription in the presence of Cy3(green)- 
or Cy5(red)-labeled deoxyuridine triphos- 
phate (dUTP) (/J) and then hybridized to 
the microarrays (12). To maximize the re- 
liability with which changes in expression 
levels could be discerned, we labeled cDNA 
prepared from ceils at each successive time 
point with Cy5, then mixed it with a Cy3- 
labeled "reference" cDNA sample prepared 
from cells harvested at the first interval 
after inoculation. In this experimental de- 
sign, the relative fluorescence intensity 
measured for the Cy3 and Cy5 fluors at 
each array element provides a reliable mea- 
sure of the relative abundance of the corre- 
sponding mRNA in the two cell popula- 
tions (Fig. 1). Data from the series of seven 
samples (Fig. 2), consisting of more than 
43,000 expression-ratio measurements, 
were organized into a database to facilitate 
efficient exploration and analysis of the 
results. This database is publicly available 
on the Internet (13). 

During exponential growth in glucose- 
rich medium, the global pattern of gene 
expression was remarkably stable. Indeed, 
when gene expression patterns between the 
first two cell samples (harvested at a 2-hour 
interval) were compared, rriRNA levels dif- 
fered by a factor of 2 or more for only 19 
genes (0.3%), and the largest of these dif- 
ferences was only 2.7-fold ( 14). However, as 
glucose was progressively depleted from the 
growth media during the course of the ex- 
periment, a marked change was seen in the 
global pattern of gene expression. mRNA 
levels for approximately 710 genes were 
induced by a factor of at least 2, and the 
mRNA levels for approximately 1030 genes 
declined by a factor of at least 2. Messenger 
RNA levels for 183 genes increased by a 
factor of at least 4, and mRNA levels for 
203 genes diminished by a factor of at least 
4. About half of these differentially ex- 
pressed genes have no currently recognized 
function and are not yet named. Indeed, 
more than 400 of the differentially ex- 
pressed genes have no apparent homology 
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to any gene whose function b kn wn (15). 
The responses of these previously unchar- 
acterized genes to the diauxic shift therefore 
provides die first small clue to their possible 
r ies. 

The global view of changes in expres- 
sion f genes with known functions pro- 
vides a vivid picture of the way in which 
the cell adapts to a changing environ- 
ment. Figure 3 shows a portion of the yeast 
metabolic pathways involved in carbon 
and energy metabolism. Mapping the 
changes we observed in the mRNAs en- 
coding each enzyme onto this framework 
allowed us to infer the redirection in the 
flow of metabolites through this system. 
We observed large inductions of the genes 
coding for the enzymes aldehyde dehydro- 
genase (ALD2) and acetyl-coeiuyme 
A(CoA) synthase (ACSJ), which func- 
tion together to convert the products of 
alcohol dehydrogenase into acetyl-CoA, 
which in turn is used to fuel the tricarbox- 
ylic acid (TCA) cycle and the glyoxylate 
cycle. The concomitant shutdown of tran- 
scription of the genes encoding pyruvate 
decarboxylase and induction of pyruvate 
carboxylase rechannels pyruvate away 
from acetaldehyde, and instead to oxalac- 
etate, where it can serve to supply the 
TCA cycle and gluconeogenesis. Induc- 
tion of the pivotal genes PCKJ, encoding 
phosphoenolpyruvate carboxykinase, and 
FBPi, encoding fructose 1,6-biphos- 
phatase, switches the directions of two key 
irreversible steps in glycolysis, reversing 
the flow of metabolites along the revers- 
ible steps of the glycolytic pathway toward 
the essential biosynthetic precursor, glu- 
coses-phosphate. Induction of the genes 
coding for the trehalose synthase and gly- 
cogen synthase complexes promotes chan- 
neling of glucose-6-phosphate into these 
carbohydrate storage pathways. 

Just as the changes in expression of 
genes encoding pivotal enzymes can pro- 
vide insight into metabolic reprogram- 
ming, the behavior of large groups of func- 
tionally related genes can provide a broad 
view of the systematic way in which the 
yeast cell adapts to a changing environ- 
ment (Fig. 4). Several classes of genes, 
such as cytochrome c-related genes and 
those involved in the TCA/glyoxylate cy- 
cle and carbohydrate storage, were coordi- 
nate^ induced by glucose exhaustion. In 
contrast, genes devoted to protein synthe- 
sis, including ribosomal proteins, tRNA 
synthetases, and translation, el ngation, 
and initiation factors, exhibited a coordi- 
nated decrease in expression. More than 
95% of ribosomal genes showed at least 
twofold decreases in expression during the 
diauxic shift (Fig. 4) (13). A noteworthy 
and illuminating exception was that the 



genes encoding mitochondrial ribosomal 
genes were generally induced rather than 
repressed after glucose limitation, high- 
lighting the requirement for mitchondrial 
biogenesis (13). As more is learned about 
the functions of every gene in the yeast 
genome, the ability to gain insight into a 
cell s response to a changing environment 
through its global gene expression patterns 
will become increasingly powerful. 

Several distinct temporal patterns of ex- 
pression could be recognized, and sets of 
genes could be grouped on the basis of the 
similarities in their expression patterns. The 
characterized members of each of these 
groups also shared important similarities in 
their functions. Moreover, in most cases, 
common regulatory mechanisms could be 
inferred for sets of genes with similar expres- 
sion profiles. For example, seven genes 
showed a late induction profile, with mRNA 
levels increasing by more than ninefold at 



*c last^mepoint but less than threefold at 
the preceding timepoint (Fig. SB), All of 
these genes were known w be glucose-re- 
PTCssed, and five of the seven were previously 
noted to share a common upstream activat- 
ing sequence (UAS), the carbon source re- 
sponse element (CSRE) (16-20). A search 
in the promoter regions of the remaining two 
genes, ACRl and IDP2, revealed that 
ACRJ, a gene essential for ACS! activity 
also possessed a consensus CSRE motif, but 
interestingly, IDP2 did not. A search of the 
entire yeast genome sequence for the con- 
sensus CSRE motif revealed only four addi- 
tional candidate genes, none of which 
showed a similar induction. 

Examples from additional groups of 
genes that shared expression profiles are 
illustrated in Fig. 5, C through F. The 
sequences upstream of the named genes in 
rig. 5C all contain stress response ele- 
ments (STRE), and with the exception 
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f HSP42, have previously been shown to 
be controlled at least in part by these 
elements (21-24). Inspection f the se- 
quences upstream of HSP42 and the two 
uncharacterued genes shown in Fig. 5C, 
YKL026c, a hypothetical protein with 
similarity to glutathione peroxidase, and 
YGR043c, a putative transaldolase, re- 
vealed that each of these genes a bo pos- 
sess repeated upstream copies of the stress- 
responsive CCCCT motif. Of the 13 ad- 
ditional genes in the yeast genome that 
shared this expression profile [including 
HSP30, ALD2, OU45, and 10 uncharac- 
terized ORFs (25)], nine contained one or 
more recognizable STRE sites in their up- 
stream regions. 

The heterotrimeric transcriptional acti- 
vator complex HAP2,3,4 has been shown 
to be responsible for induction of several 
genes important for respiration (26-28). 
This complex binds a degenerate consensus 
sequence known as the CCAAT box (26). 
Computer analysis, using the consensus se- 
quence TNRVTGGB (29), has suggested 
that a large number of genes involved in 
respiration may be specific targets of 
HAP2.3A (30). Indeed, a putative 
HAP2,3 t 4 binding site could be found in 
the sequences upstream of each of the seven 
cytochrome c -related genes that showed 
the greatest magnitude of induction (Fig. 
5D). Of 12 additional cytochrome c-related 
genes that were induced, HAP2,3,4 binding 
sites were present in all but one. Signifi- 
cantly, we found that transcription of 
HAP4 itself was induced nearly ninefold 
concomitant with the diauxic shift. 

Control of ribosomal protein biogenesis 
is mainly exerted at the transcriptional 
level, through the presence of a common 
upstream-activating element (UAS^) 
that is recognized by the Rapl DNA-bin3- 
ing protein (31, 32). The expression pro- 
files of seven ribosomal proteins are shown 
in Fig. 5F. A search of the sequences 
upstream of all seven genes revealed con- 
sensus Rapl -binding motifs (33). It has 
been suggested that declining Rapl levels 
in the cell during starvation may be re- 
sponsible for the decline in ribosomal pro- 
tein gene expression (34)- Indeed, we ob- 
served that the abundance of RAP I 
mRNA diminished by 4.4-fold, at about 
the time of glucose exhaustion. 

Of the 149 genes that encode known or 
putative transcription factors, only two, 
HAP4 and S1P4, were induced by a factor of 
more than threef Id at the diauxic shift. 
S1P4 encodes a DNA-binding transcrip- 
tional activator that has been shown to 
interact with Snfl, the "master regulator" of 
glucose repression (35). The eightfold in- 
duction of S1P4 upon depletion of glucose 
strongly suggests a role in the induction of 



downstream genes at the diauxic shift. 

Although most of the transcriptional 
responses that we observed were not pre- 
viously known, the responses of many 
genes during the diauxic shift have been 
described. Comparison of the results we 
obtained by DNA microarray hybridiza- 
tion with previously reported results there- 
fore provided a strong test of the sensitiv- 
ity and accuracy of this approach. The 
expression patterns we observed for previ- 
ously characterized genes showed almost 
perfect concordance with previously pub- 
lished results (36). Moreover, the differ- 
ential expression measurements obtained 
by DNA microarray hybridization were re- 
producible in duplicate experiments. For 
example, the remarkable changes in gene 
expression between cells harvested imme- 
diately after inoculation and immediately 
after the diauxic shift (the first and sixth 
intervals in this time series) were mea- 
sured in duplicate, independent DNA mi- 
croarray hybridizations. The correlation 
coefficient for two complete sets of expres- 
sion ratio measurements was 0.87, and for 
more than 95% of the genes, the expres- 



sion ratios measured in these duplicate 
experiments differed by less than a factor 
f 2. However, in a few cases, there were 
discrepancies between our results and pre- 
vious results, pointing to technical limita- 
tions that will need to be addressed as 
DNA microarray technology advances 
(37, 38). Despite the noted exceptions, 
the high concordance between the results 
we obtained in these experiments and 
those of previous studies provides confi- 
dence in the reliability and thoroughness 
of the survey. 

The changes in gene expression during 
this diauxic shift are complex and involve 
integration of many kinds of information 
about the nutritional and metabolic state 
of the cell. The large number of genes 
whose expression is altered and the diver- 
sity of temporal expression profiles ob- 
served in this experiment highlight the 
challenge of understanding the underlying 
regulatory mechanisms. One approach to 
defining the contributions of individual 
regulatory genes to a complex program of 
this kind is to use DNA microarrays to 
identify genes whose expression is affected 



Fig. 2. The section of the ar- 
ray indicated by the gray box 
in Rg. 1 is shown for each of 
the experiments described 
here. Representative genes 
are labeled. In each of the ar- 
rays used to analyze gene 
expression during the diauxic 
shift, red spots represent 
genes that were induced rel- 
ative to the initial timepoint, 
and green spots represent 
genes that were repressed 
relative to the initial timepoint. 
In the arrays used to analyze 
the effects of the tupib mu- 
tation and YAP1 over expres- 
sion, red spots represent 
genes whose expression was 
increased, and green spots 
represent genes whose ex- 
pression was decreased by 
the genetic modification. Note 
that distinct sets of genes are 
induced and repressed in the 
different experiments. The 
cxynptete images of each of 
these arrays can be viewed on 
the Internet {73). Ceil density 
as measured by optical densi- 
ty (00) at 600 nm was used to 
measure the growth of the 
curt ure. 
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by mutations in each putative regulatory 
gene. As a test of this strategy, we analyzed 
the genomewide changes in gene expression 
that result from deletion of the TUPl gene. 
Transcriptional repression of many genes by 
glucose requires the DNA-binding repressor 



Migl and is mediated by recruiting the nan. 
scriptional co- repressors Tupl and Cyc67 
Ssn6 (39). Tupl has also been implicated in 
repression of oxygen-regulated, mating-type- 
specific, and DNA-damage-inducible genes 
(40). 
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Rg. 3. Metabolic reprogramming inferred from global analysis of changes in gene expression Only key 
metabolic intermediates are identified. The yeast genes encoding the enzymes that catalyze each step 
.n this metabolic circuit are identified by name in the boxes. The genes encoding succinyl-CoA synthase 
and gryec^e^debranching enzyme have not been expScffly identified, but the ORFs YGR244 and 
YPR184 show significant homology to known sucanyl-CoA synthase and giycogen-debranching en- 
zymes, respectively, and are therefore included in the corresponding steps in this figure. Red boxes with 
whrte lettering identify genes whose expression increases in the diauxic shift. Green boxes with dark 
green lettering identify genes whose expression diminishes in the diauxic shift. The magnitude of 
induction or repression is indicated for these genes. For muftimeric enzyme complexes such as 
succinate dehydrogenase, the indicated fold-induction represents an unweighted average of all the 
genes Isted tn the box. Black and white boxes indicate no significant differential expression (less than 
twofold). The direction of the arrows connecting reversible enzymatic steps indicate the direction of the 
flow of metabolic tntermedates. inferred from the gene expression pattern, after the diauxic shift Arrows 
representing steps catalyzed by genes whose expression was strongly induced are highlighted in red 
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Wild-type yeast cells and cells bearing 
a deletion f the TUP 1 gene (tup J A) were 
grown in parallel cultures in rich medium 
containing glucose as the carbon source. 
Messenger RNA was isolated from expo- 
nentially growing cells from the two pop- 
ulations and used to prepare cDNA la- 
beled with Cy3 (green) and Cy5 (red), 
respectively (11). The labeled probes were 
mixed and simultaneously hybridized to 
the microarray. Red spots on the microar- 
ray therefore represented genes whose 
transcription was induced in the tup/A 
strain, and thus presumably repressed by 
Tupl (41). A representative section of the 
microarray (Fig. 2, bottom middle panel) 
illustrates that the genes whose expression 
was affected by the tup J A mutation, were, 
in general, distinct from those induced 
upon glucose exhaustion [complete images 
of all the arrays shown in Fig. 2 are avail- 
able on the Internet (13)]. Nevertheless, 
34 (10%) of the genes that were induced 
by a factor of at least 2 after the diauxic 
shift were similarly induced by deletion of 
TUPl , suggesting that these genes may be 
subject to TUP J -mediated repression by 
glucose. For example, SUC2 t the gene en- 
coding invertase, and all five hexose trans- 
porter genes that were induced during the 
course of the diauxic shift were similarly 
induced, in duplicate experiments, by the 
deletion of TUPl. 

The set of genes affected by Tupl in this 
experiment also included a-glucosidases, 
the mating-type-specific genes MFAJ and 
MFA2, and the DNA damage-inducible 
RNR2 and RNR4, as well as genes involved 
in flocculation and many genes of unknown 
function. The hybridization signal corre- 
sponding to expression of TUPJ itself was 
also severely reduced because of the (in- 
complete) deletion of the transcription unit 
in the tup/A strain, providing a positive 
control in the experiment (42). 

Many of the transcriptional targets of 
Tupl fell into sets of genes with related 
biochemical functions. For instance, al- 
though only about 3% of all yeast genes 
appeared to be TUP 1 -repressed by a factor 
of more than 2 in duplicate experiments 
under these conditions, 6 of the 13 genes 
that have been implicated in flocculation 
(15) showed a reproducible increase in 
expression of at least twofold when TUPl 
was deleted. Another group of related 
genes that appeared to be subject to TUPJ 
repression encodes the serine-rich cell 
wall mannoproteins, such as Tipl and 
Tirl/Srpl which are induced by cold 
shock and other stresses (43), and similar, 
serine-poor proteins, the seripauperins 
(44). Messenger RNA levels for 23 of the 
26 genes in this group were reproducibly 
elevated by at least 2.5-fold in the tupIA 



www.sciencemag.org • SCIENCE • VOL 278 • 24 OCTOBER 1997 



strain, and 18 of these genes were induced 
by more than sevenfold when TUPl was 
deleted. In contrast, none of 83 genes that 
c uld be classified as putative regulators of 
the cell division cycle were induced more 
than twofold by deletion of TUPL Thus, 
despite the diversity of the regulatory sys- 
tems that employ Tupl, most of the genes 
that it regulates under these conditions 
fall into a limited number of distinct func- 
tional classes. 

Because the microarray allows us to 
monitor expression of nearly every gene in 
yeast, we can, in principle, use this ap- 
proach to identify all the transcriptional 
targets of a regulatory protein like Tupl. It 
is important to note, however, that in any 
single experiment of this kind we can only 
recognize those target genes that are nor- 
mally repressed (or induced) under the 
conditions of the experiment. For in- 
stance, the experiment described here an- 
alyzed a MAT o strain in which MFAJ 
and MFA2, the genes encoding the a- 
factor mating pheromone precursor, are 
normally repressed. In the isogenic tupl A 
strain, these genes were inappropriately 
expressed, reflecting the role that Tupl 
plays in their repression. Had we instead 
carried out this experiment with a MATA 
strain (in which expression of MFAJ and 
MFA2 is not repressed), it would not have 
been possible to conclude anything re- 
garding the role of Tupl in the repression 
of these genes. Conversely, we cannot dis- 
tinguish indirect effects of the chronic 
absence of Tupl in the mutant strain from 
effects directly attributable to its partici- 
pation in repressing the transcription of a 
gene. 

Another simple route to modulating the 
activity of a regulatory factor is to overex- 
press the gene that encodes it. YAP] en- 
codes a DNA-binding transcription factor 
belonging to the b-zip class of DNA-bind- 
ing proteins. Overexpression of YAP J in 
yeast confers increased resistance to hydro- 
gen peroxide, o-phenanthroline, heavy 
me tab, and osmotic stress (45). We ana- 
lyzed differential gene expression between a 
wild-type strain bearing a control plasmid 
and a strain with a plasmid expressing YAP I 
under the control of the strong GAL1-10 
promoter, both grown in galactose (that is, 
a condition that induces YAP J overexpres- 
sion). Complementary DNA from the con- 
trol and YAP] overexpressing strains, la- 
beled with Cy3 and Cy5, respectively, was 
prepared from mRNA isolated from the two 
strains and hybridized to the microarray. 
Thus, red spots on the array represent genes 
that were induced in the strain overexpress- 
ing YAPJ. 

Of the 17 genes whose mRNA levels 
increased by more than threefold when 



YAP/ was overexpressed in this way, five 
bear homology to aryl-alcohol oxidoreduc- 
tases (Fig. 2 and Table 1). An additional 
four of the genes in this set also belong to 
the general class of dehydrogenases/oxi- 
doreductases. Very little is known about 
the role of aryl-alcohol oxidoreductases in 
S. cerevisiae, but these enzymes have been 
isolated from ligninolytic fungi, in which 
they participate in coupled redox reac- 
tions, oxidizing aromatic, and aliphatic 
unsaturated alcohols to aldehydes with the 
production of hydrogen peroxide (46, 47). 
The fact that a remarkable fraction of the 
targets identified in this experiment be- 
long to the same small, functional group of 
oxidoreductases suggests that these genes 

Fig. 4. Coordinated reg- 
ulation of functionally re- 
lated genes. The curves 
represent the average in- 
duction or repression ra- 
tios for all the genes in 
each indicated group. 
The total number of 
genes in each group was 
as follows: ribosomal 
proteins. 1 12; translation 
elongation and initiation 

factors. 25; tRNA synthetases (excluding mitochondial synthetases). 17; glycogen and trehalose syn- 
ES^SSS: Cyt0Chr ° m€ C ° XidaSe ^ redUCtaSe « ^?and Tcl So^. 

Table 1- Genes induced by YAPl overexpression. This list includes all the genes for which mRNA levels 

w^^ m0re man t " otoW upon YAP1 ^expression in both of two c^'c^ 

for when the average .ncrease in mRNA level in the two experiments was great* ^SSSwSl 

Posrtons of the canonical Yap1 binding sites upstream of the start codon^en 

average fold-.ncrease in mRNA levels measured in the two experiments^ indeed 
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might play an important protective role 
during oxidative stress. Transcription of a 
small number of genes was reduced in the 
strain overexpressing Yapl. Interestingly, 
many of these genes encode sugar per- 
meases or enzymes involved in inositol 
metabolism. 

We searched for Yapl -binding sites 
(TTACTAA or TGACTAA) in the se- 
quences upstream of the target genes we 
identified (48). About two-thirds of the 
genes that were induced by more than 
threefold upon Yapl overexpression had 
one or more binding sites within 600 bases 
upstream of the start codon (Table 1), sug- 
gesting that they are directly regulated by 
Yapl. The absence of canonical Yapl-bind- 
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ing sites upstream of the then may reflect 
an ability of Yapl to bind sites that differ 
from the canonical binding sites, perhaps in 
cooperati n with other factors, r less like- 
ly, may represent an indirect effect of Yapl 
overexpression, mediated by one or more 
intermediary factors. Yapl sites were found 
only four times in the corresponding region 
of an arbitrary set of 30 genes that were not 
differentially regulated by Yapl. 

Use of a DNA microarray to character- 
ize the transcriptional consequences of 
mutations affecting the activity of regula- 
tory molecules provides a simple and pow- 
erful approach to dissection and character- 
ization of regulatory pathways and net- 



works. This strategy also has an important 
practical application in drug screening. 
Mutations in specific genes encoding can- 
didate drug targets can serve as surrogates 
for the ideal chemical inhibitor or modu- 
lator of their activity. DNA microarrays 
can be used to define the resulting signa- 
ture pattern of alterations in gene expres- 
sion, and then subsequently used in an 
assay to screen for compounds that repro- 
duce the desired signature pattern. 

DNA microarrays provide a simple and 
economical way to explore gene expres- 
sion patterns on a genomic scale. The 
hurdles to extending this approach to any 
other organism are minor. The equipment 
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requifed f r fabricating and using DNA 
microarrays (9) consists of c mponents 
that were chosen for their modest cost and 
simplicity. It Was feasible for a small group 
to accomplish the amplification of m re 
than 6000 genes in about 4 months and, 
once the amplified gene sequences were in 
hand, only 2 days were required to print a 
set of 110 microarrays of 6400 elements 
each. Probe preparation, hybridization, 
and fluorescent imaging are also simple 
procedures. Even conceptually simple ex- 
periments, as we described here, can yield 
vast amounts of information. The value f 
the information from each experiment of 
this kind will progressively increase as 
more is learned about the functions of 
each gene and as additional experiments 
define the global changes in gene expres- 
sion in diverse other natural processes and 
genetic perturbations. Perhaps the greatest 
challenge now is to develop efficient 
methods for organizing, distributing, inter- 
preting, and extracting insights from the 
large volumes of data these experiments 
will provide. 
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41. Differences in mRNA levels between trie tupid and 
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experiments. The correlation coefficient between the 
complete sets of expression ratios measured in 
these duplicate experiments was 0.83. The concor- 
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darce_between the sets of genes that appeared to 
"•^oad was very high between the two mart. 
^ Wh*n only the 355 genes that showed at 

w a twofold increase in mRNA in the tus1 A stran 
-T^er of the dupicate e^^erinrt^ ™ 
P*ea the correlation coefficient was OfiB. 
42. The tupiA mutation consists of an rwtibn of the 
L£U2 cooing sequence, rdudhg a stcio oodon.be. 
tween the ATG of 7UPJ and an Eco R I site 124 base 
pars before the stop codon of the TUPi gene. 
4a LRKc>waJsia,rCKoncto.M.i 
15.341(1995). 

44. M. Vfewanathan, G. Muthukumar, Y. S. Cong J 
Lenard. Gene 148, 149 (1994). ' 

46. A. Gutierrez. L Carameto. A, Prieto, M. J. Martnez. 
A. T. Maranez. AppL Environ. Microbiol. 60. 1783 
(1994). 

47. AMuheimerai.tV. J. Biochem. 195.369(1991). 

48. J-^VVernmie, M. S. Szctypka, D. J. Thiele.W.S. 
Move-Rowley, j. Biot. Chenx 269. 32592 (1994). 

49. Microarrays were scanned using a custom-bull 
scanning laser microscope buflt by S. Smrth with 
software written by N. Zrv. Detafc cortcemng scan* 
ner design and construction are available at cmgm. 
stanford.ecU/pbrown. Images were scanned at a 
"^solution of 20 »im per pixel. A separate scan, using 
the appropriate excitation fine, was done for each of 
the two fkiororjhores used, taring tte sc»vwig pro- 
cess, the ratio between the signals in the two chan- 
nels was calculated for several array elements con- 
taining total genornic DMA, To normafize the two 
channels with respect to overall intensity, we then 
adjusted phc^omufupfter and laser power settv^s 
such that the signal ratio at these elements was as 
dose to 1 .0 as possible. The combined rnages were 
analyzed with custom-written software. A boundng 
box, fitted to the size of the DNA spots h each 
ouacrant, was placed over each array element The 
average fluorescent intensity was calculated by sum- 
mmg the intensities of each pixel present in a bound- 
ng box, and then drvidng by the total number of 
Pixels. Local area background was calculated tor 
each array element by detemtinrng the average ftuo- 
rescent intensity tor the lower 20% of pixel ritenaU 
toes. Although this method tends to underestimate 
the background, causing an underestimation of ex- 
treme ratios, ft produces a very consistent and nose- 
tolerant approximation. Although the analog-to- 
digrtal board used for data collection possesses a 
wide dynamic range (12 bits), several signals wan 
saturated (greater than the maxrnum signal ntensity 
allowed) at the chosen settings. Therefore, extreme 
rabos at bright eiernents are generaJry unoerestiTtat. 
ed. A signal was deemed sigrrtcant if the average 
intensity after background subtraction was at least 
2.5-fold higher than the standard deviation in the 
background measurements for al elements on the 
array. 

50. in addition to the 17 genes shown r Table 1 three 
additional genes were induced by an average of 
more than threefold in the duplicate expervnentlTbut 
m one of the two experiments, the induction waa'lees 
than twofold (range 1 .6- to 1 .9-fold) 
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helpful advice on Yapi; and S. Klaphotz and the 
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Human Genome Placed on Chip; Biotech Rivals Put It 
for Sale 



By ANDREW POLLACK (NYT) 1030 words 
The genome on a chip has arrived. 



Melding high technology with biology, several companies are rushing to sell slivers of glass or 
nylon, some as small as postage stamps, packed with pieces of all 30,000 or so known human 
genes. 

The new products will allow scientists to scan all genes in a human tissue sample at once to 
determine which . genes are active, a job that previously required two or more chips. The whole- 
genome chips will lower the cost and increase the speed of a widely used test that has 
transformed biomedical research in the last few years. 

"It's sort of a milestone event, very similar to generating an integrated circuit of the genome," 
said Stephen P. A Fodor, the chief executive of Affymetrix Inc., the leading seller of gene chips 
which are also called microarrays. 

Asymetrix based in Santa Clara, Calif, is expected to announce today that it is accepting orders 
for its whole-genome chip. F 8 8 

The announcement seems timed to steal some thunder from the rival Agilent Technologies 
which is based m nearby Palo Alto. Agilent is to be the host of an analyst meeting today and it 
plans to announce then that it has started shipping test versions of its whole-genome chip. 

Applied Biosystems of Foster City, Calif., a unit of the Applera Corporation, started the race in 
July with an announcement that it would have a whole-genome chip out by the end of this year 
NimbleGen Systems, a small company in Madison, Wis., announced a few days later that it had a 
genome on a chip that it was not selling but that it was using to run tests for customers. 

Gene chips, which detect genes that are active, meaning they are being used to make a protein 
have become essential tools. Scientists try to understand the genetic mechanisms of disease by 
seeing which genes are turned on in, say, a sick kidney or lung compared with those active in a 
healthy organ. Pharmaceutical companies look at gene activity patterns to try to predict the 
effects of drugs. 



Scientists have found that tumors that look the same under the microscope-can differ in terms of 
which genes are active So studying gene patterns could become a way to discriminate berW 
deadly and not-so-deadly tumors, or to predict which drug will work best for a particular patient 

r^olutiraaT 6 VCnd0rS C0DCeded ^ ±C Change fr ° m ^ CWPS t0 ° nC " m ° re Symb ° lic *"* 

'You can do just as good science with two chips, it costs you a little more," said Roland Green, 
the vice president for research and development at NimbleGen. 

Some scientists questioned whether the chips really have all human genes, because the exact 
number and identities of all the genes is not known. 

The advent of the genome on a chip is, however, evidence that biotechnology, to the extent that it 
uses electronics, is experiencing some of the rapid progress that has made semiconducS^ and 
computers continuously cheaper and smaller. ^uuunors ana 

"One of the effects everyone is looking for in the genomics area is Moore's law - more data, less 
money said Doug Dolgmow, an executive vice president at Gene Logic, which sells data mom 
gene chip studies to pharmaceutical companies. "This is a step in that direction." 

monte ^ ^ ° f transist0rs on a semiconductor chip doubles every 18 

Asymetrix's gene chips ^ m fact> made with the same techni used m smi d 
chips. In the mid-1990's, the company came out with a set of five chips covering what waTC 
known of the human genome After the human genome sequence was virtually completed hV 
2000, the company developed a two-chip set with all the known genes. Now it has the single 
chip, which some scientists say will be more convenient. 

'7i C I 6 abl V° ,0 ° k at a " gCTeS 3t ° ne time t0 * et a g lobal view of what's going on " said 
John R Walker, who runs gene chip operations at the Genomics Institute of the NovaSis 
Research Foundation in San Diego. 

Costs should also be lower Gene chips have been so expensive that many academic scientists 

ta^SS^ST ? y AffymCtriX Said h WOuld Sdl *» -hole-genome hTps 
for $300 to $500 each depending on volume, little more than half the price of the two-chip set 
The other companies have not announced prices. 

For Affymetrix a successful whole-genome chip "is essential for them to maintain their 
dominance of high-end microairays, said Edward A. Tenthoff, an analyst at U.S. Bancorp Piper 
Jaffray. Affymetrix had total product sales in 2002 of about $250 million, and a company 
spokesman said that human genome chips are its top-selling product. 

Mr. Tenthoff, who recommends Affymetrix stock, said the company's sales growth rate had 
moderated as it faces tougher competition. Agilent, a spinoff of Hewlett-Packard that makes its 
gene chips by printing DNA components onto glass slides using ink jet printers, has gained 
share, he said. Applied Biosystems, the largest maker of genomics equipment over all will be 



2 



1. 



entering the microarray segment of the business with its whole-genome chip, emphasizing the 
connection of that product to the others it offers, including the gene database developed by its 
sister company, Celera Genomics. 

Jeffrey Trenl, scientific director of the Translation^ Genomics Research Institute in Phoenix, 
said that while whole-genome chips are useful for medical discovery, the biggest growth oftne 
market will be for chips that can be used by doctors to do diagnoses. And whole-genome chips 
are too cumbersome for that, he said. Rather, once scientists use the whole-genome chips to find 
particular genes that are associated with, say, tumor aggressiveness or drug effectiveness, he 
said, they will then make smaller and cheaper chips containing just those genes for use in 
diagnosis. 
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Agilent Technologies ships whole human genome on sinale IT Re,eas 

microarray to gene expression customers for evaluation > unl 

Company to Introduce first commercial whole human m*™Z?™!£2™ * C ° rP ° raU 

► Electronic 

PALO ALTO. Calif.. Oct 2. 2003 ► Ufe Scien 



Chemical 



tStSSSSS^ who, human-genome microarrays 

density format, which dm mnmZ^SooSSSS^ TSSIV- " v*?* , A9i,ent,s new doub ^ 
new platform enables drug-discovery and I disVas KsZZ^llSj ^ ss ' si ^ microarray. The 
lower cost and with higher reproducibility. ^searchers to perform whole-genome screening at a 

"This is an important step toward our release of the r,m «,k«u u 

is expected to be available for order SSHi end ^^^S^" 0 "^ miCr ° array product - which 
and general manager of Agilent's BioResearch Solutions Un T '- r , 3 ™l Saunders . vice president 
sample, one-chip format wrth the increased I senSSll ^ 5 "^Tf 8 have ,on 9 wanted a °*e- 
and high-quality performance make^^;cJ^S^„ a 2 d ^ ^ pr0bes " The 0051 sa ™°s 
own microarrays." P 3 com P ellm 9 alternative for scientists who make their 

Agilent's microarrays are based on the industry-standard 1" * v ,7= w 

compatible with most commercial microarravYcannere an a«L ( ? "! m) format ' which is 

using content from public databases aSdSete^Su^S =T merc,al ^croarrays are developed 

information made available to customers Gene sJZuenrZ L sec ' uence and annotation 

and then validated empirically Vw^l^XSS^S^ "* using al 9° rithms 

comprised of functionally validated probes wXe moitS K£ ^f' The ? SU,t is a mic ™™y 

information commercially available. up-to-date and comprehensive genome 



requires fewer reagents and 



Advantages of the double-density format include: 

• Lower cost Not only is one microarray less expensive than two it 
reduces instrumentation demands 

• 2SKK3Ei L u " - a !lns " e ■» ,u * e ' ««- y — ■» ,„ 

. an* sa mp ,e use. A sm„,e, <,„an % of sam p,e ma ,e„ a , „ to pert,™ an expe-imem. 

Availability 

Agilent's Whole Human Genome Microarray is expected to be available for order by the end of the year. 
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on the Web at www.aoilent.cnm 
F rward-Looking Statements 



This news release contains ; forward-looking statements (including, without limitation, statements relatim 
10 WS^VS^ 0 ? whole-genome microarray platform will be available or orteSft? 
end of 2003) that .nvo ve nsks and uncertainties that could cause results to differ material from 
management's current expectations. These and other risks are detailed in the company's filingYwith the 
Secunties and Exchange Commission, including its Annual Report on Form 10-K for the vear «fnrt»rt rw 
31 2002. its Quarterly Report on Form 10-Q for the quarter ended July sTmm I 2d iU i&ZSSZS' 
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AFFYMETRIX GENECHIP(R) BRAND HUMAN GENOME U133 PLUS 2.0 ARRAY 

^rjS2f23S WR) Brand Human Genome U133 p,us 20 

SANTA CLARA, CA USA 10/02/2003 
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SANTA CLARA , Calif.. Oct. 2 /PRNewswire/ - Af tutrix, Ine 
(Nasdaq: AFFX) announced today that it is takino n^i„. « 
GeneChip(K) brand Hunan Gano J U133 Plus 2 0 A rra v c /f 

protein-coding content of the human genome 'IT I„g?e ^Llialtv^il^le 

»^[- — ^assr^ a 

. ( ^°',. hl,p://www newscom c O'n'cgi-bin/prnh;20031002/SFTH021 ) 

the Hu^an ^A^SiS^S^" V"" °" ° f 3 h —» 

further demonstrates the unique po^er and ZllL^ f ^ d " a "P aci <^ «* 
explore vast areas of the T^TlliT'rlZir.^'cL^l "hTlnief 

capacity increases dramatically » °' even as our data 

contS IV^l 22 PlUS 2,0 Array ' Which wiU shi P in °«ober. combines the 
e ° e " B the previous HG-U133 two-array set with nearly io,oSo new probe 

tranJri^ i" 9 ab ° Ut 6 ' 5 °° MW 96neS ' for a tota l <* "early 50 000 
transcripts and variants. This new information, verified against the 
version of the publicly available genome map, provides researchers the lot 

ss^^^^r^r^Ar - Lpr r ion ™" s ^ 

tTi-»-» . U133 Plus 2 -° Array is identical to the previous kg. 

ss ^ ; s.t ouo^'Se^^c^c'itv^fir" betuee " the c "° 

human product, the HG-U133 p"s 2 o i™! Previous-generation Affymetrix 

=t^a-r J " 2 * *«ss.--sr3 - 

tra^i^ra^rcta^^i^ s'ngtelrrS £ SST^— - 
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. ^^^x^^^^^i^^^ in all , 

• independent measurements provides o^imaT -nX technol °5y Using multiple, 
most accurate, consistent^stLXtJcalirLan ^ Y s P ecifici <^ and the 

-More data points produce more re laD li rSSE'S ul^^ 
better science," said Nicholls. "Our powerful nrobf «f "Jt^ely, enable 
customers the assurance that their ^ 8et stra tegy gives our 

their sample." " r ar " y results ■ctu.lly reflect what's in 

Affymetrix is also launchina an 11 
18-micron HG-U133A Array called \£ GeneChL „r m T" 10 " ° f itS po P ular 
feature size on this nel design LanfSLth 2 '° teay - The reduced 

volumes than on the previous TbZIItL tl I Ca " USe smal ^r sample 

This new array represent over 2 0 000 trans LS! COmpromisin 9 Performance, 

human biology and disease processed A TroZ/tlrl C *° USed t0 explore 
original GeneChip HG-U133A ArraJare represented on the 

HG-U133A 2.0 Array. Y ldent ically replicated on the GeneChip 

More information on the desiqn of th* wr im-j m 

http://www.affymetnx.com. e ac 

Affymetrix will be presentina fn-n-v^>- * * 
products at the BioTectaica tradf Jhow " ; nformatlon on this and other 
The Company will also hold a press c^nf er^ 0 ^; ° n ° Ct ' 7 " 9 ' 20 °3- 

12 p.m. at the show regarding the nlTt n ° U ^ ? ' from 11 a " m - to 
would like to attend thS prist conl^cT °" 3 2 "° If v ° u 

at c.stupnicka@northbankcommunicati 0 ns com C ° ntaCt Caroline Stupnicka 

About Affymetrix: 

the ^l^iS^^^S breakth r 9h t0 ° 1S th3t a " d ^ng 

to the life sciences, Af f^^rdlveLrin^ 13 * ° f . Conductor technology 
enable scientists to improve tS q Zll? y or lifT^ 1 " 65 SyStemS ^ 
include pharmaceutical, biotecnnol™™ Z Company's customers 

products companies as well as academic 3 f lchemical < diagnostics and consumer 
research institutes. Affymetrix overs' an ^ Md ° ther n °"-Pr°fit 

products and services, iSSSq ffs^ 63 ? a " dln 9 P°"folio of integrated 
growing markets focused on understanding^ 1 platf °™» to address 

human health. Additional inform^tJon on^f relationship between genes and 
http://www.affymetrix.com. nf ° rmatlon ° n Affymetrix can be found at 

All statements in this press release thar a™ v, • 

"forward-looking statements" within tSI m^-nJ % * hlst °rical are 
Securities Exchange Act as amended i^i * 9 ° f Section 2" of the 
"expectations," "beliefs?" "hopes " "intenri^ "I*!"* 1 *- "9«*ing Affymetrix- 
Such statements are subject to risks ^ ' "strategies" or the like, 

results to differ materially for AffvmeLi^f rt31 ^ les that c °^ d cause actual 
but not limited to risks of the Collet* h -f ° m th ° Se P ro ^cted, including, 
higher levels of revenue, higher ar Uty t0 aChieVe and sustain 
uncertainties relating to tecWoa!Ll margins ' redu «d operating expenses, 
development, market acceptance^ (inc^dina^^ 68 ' manuf during, Product 
development and market acceptance ^ ^"V""!' 31 ^ 165 relatin 9 to product 
and the HG-U133A 2.0), per^neTretentLn P G " UU3 Hunaa PluS 2 "° Arra V 

pricing of Affymetrix product^ dePenSenc^^ 0 "^^"' 168 " lated to cost a «d 
uncertainties relating to so ' source 1 collaborative partners, 

and other regulatory approvals corner f" Ppllers < uncer tainties relating to FDA 
property of others and thrun^rtSti^ 'J'*' relatin 9 to intellectual 
These and other risk factors are discus^f ■ P protect i°n and litigation. 

tors are discussed in Affymetrix- Form 10-K for the 

htt 0 ://w™. Dni ew S wire.conV^ 
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year ended December 31, 2002 and other SEC renort-* 

-*epo f ts~on Form ~K>-Q- for -subsequent ^arterW^ri^s ****** 
disclaims any obligation or undertaking to re leas ^iEST^"* J**—** 
revisions to any forward -lookina statPrnJ^/ publicly any updates or 

events, coitions. o r circumstances on wnfcfan^n ^^^e^sea 
traders SST^^^*^.-- - « 
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Macroresults through Microarrays 

John C. Rockett, Reproductive Toxicology Division (MD-72), National Health and Environmental Effects Research Laboratory Office of Research 
and Development, US Environmental Protection Agency, Research Triangle Park, 2525 East Highway 54, Durham NC 2771 1 USA- 
tel: +1 919 541 2071, fax: +1 919 541 4017, e-mail: rockett.john@epa.gov ' 



The third enactment of Cambridge 
Healthtech Institute's Macroresults 
through Microarrays meeting was held 
in Bost n (MA, USA) from 29 April- 
1 May 2002. The subtheme of this year's 
meeting was 'advancing drug discov- 
ery', a widely touted application for 
array technology. 



The volution of microarrays 
If you were asked 'Who first conceived 
f the idea of microarrays', who would 
come to mind? Mark Schena perhaps, 
first author of the seminal 1995 paper 
on cDNA arrays [1]? Maybe Pat Brown, 
Schena's then supervisor? Or perhaps 
Stephen Fodor, the primary driver 
behind Affymetrix's (http://www. 
affymetrix.com) oligonucleotide-based 
platform [2]. Brits might even chant the 
name of Ed Southern [3]. Well, accord- 
ing to Roger Ekins (University College 
London Medical School; hup; //www. 
ucl.ac.uk/medicine/) all these answers 
would be wrong. It was in fact Ekins 
and his colleagues who first conceived 
of and patented 'a new generation of 
ultrasensitive, miniaturized assays for 
protein and DNA-RNA measurement 
based on the use of microarrays' in the 
mid 1980s [4]. The concept and poten- 
tial f array technology was more fully 
described in a later publication, in 
which Ekins et al. [5] concluded that an- 
tib dy micr spots of -50 u.m 3 could be 
achieved, and that as many as 2 million 
different immunoassays could, in prin- 
ciple, be accommodated on a surface 
area f 1 cm*. 

Technological inn vati n 

In practice, it took a different biological 

m lecule (DNA), a different research 



group, and a leap into microfabri- 
cation technology to even begin 
approaching these kinds of densities 
[Affymetrix patent 6045996 talks of 
one million spots cm* 2 ]. Of course, 
advancing technology is one of the 
driving engines behind the genomics 
juggernaut, and we are already seeing 
'4th generation' machines for fab- 
ricating DNA chips. If the company 
representatives at this meeting are to 
be believed (and their cases seemed 
strong), spotting is out, and in situ 
fabrication of oligonucleotide-based 
'iterative custom arrays' is in. Whether 
you go with the Combimatrix's (http:// 
www.combimatrix.com) electrochemi- 
cally directed synthesis and detection 
system, febit's (http://www.febit.com) 
Ceniom® technology, or Nimblegen's 
(http://www.nimblegen.com) Maskless 
Array Synthesizer technology is a 
matter of personal choice. However, 
each of these machines provides the 
flexibility to design variable length 
oligonucleotide probes from se- 
quences inputted by the user, and then 
perform in situ synthesis of an array. 
Each system also boasts unique advan- 
tages. For example, Combimatrix's 
biological array processor is a semi- 
conductor coated with a 3D layer 
of porous material in which DNA, 
RNA, peptides or small molecules 
can be synthesized or immobilized 
within discrete test sites, while febit's 
Ceniom One* is a fully integrated 
gene-expression analysis system with 
minimal user hands-on time - the 
probe sequences are programmed, the 
RNA samples inserted, and the gene 
expressi n data is pumped ut a few 
hours later. 



Cell- and tissue-based arrays 
Array technology is in most people's 
minds firmly linked with gene-expression 
profiling. Fewer are aware that cell- and 
tissue-based arrays have been devel- 
oped, and how they can provide 
a vital extra dimension to research. In 
support of this, Barry Bochner gave an 
update on the cell-based array system 
that Biolog (http://www.biolog.com) 
has produced for simultaneously mea- 
suring the effects of one gene in the cell 
under thousands of growth conditions 
(see [6] for further details). David Walt 
(Tufts University; http://www.tufts. 
edu/) is developing single live cell ar- 
rays using optical imaging fiber (OIF) 
technology. An array of microwells is 
fabricated on the face of an OIF at den- 
sities of up to 10 million wells cm-*. 
Cells are then added to the wells and 
disperse at an average of one cell per 
well. Physiological and genetic re- 
sponses of each cell are measured via 
fluorescence produced by reporter 
genes (e.g. /ocZ, gfp. Assays performed 
so far include yeast live or dead cell 
assay, microenvironment pH and 
0 2 measurements, promotor responses 
using the loci and pho>4 reporter genes, 
and protein-protein interactions using 
the yeast two-hybrid system. The main 
advantage of this system is that the cells 
remain alive during the assay, which 
means a real-time timecourse can be 
performed and/or the array passed 
from sample to sample. This would be 
useful in, for example, the scanning of 
a combinatorial drug library f r specific 
physiological effects. 

Tissue arrays are a useful complemen- 
tary technology to DNA arrays because 
th y can be used to help validate and 
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understand the biological and medical 
significance of gene changes discov- 
ered using standard DNA arrays. For 
example, an array of tumor tissues can 
be screened for the protein (using im- 
munohistochemistry), message (using 
in iitu hybridi2ation) and copy number 
(using comparative genomic hybridiza- 
tion) of a gene of interest, to determine 
if expression of the gene (or lack 
thereof) is related in any way to sur- 
vival. They can also be used to predict 
the probability of clinical failure of lead 
compounds as a result of toxicity by 
evaluating the distribution of the drug 
targets in normal tissue. Spyro Mousses 
and his co-workers at the National 
Human Genome Research Institute 
(http://www.nhgri.nih.gov/index.html) 
have built such arrays, including a 
multi-tumor array (-5000 specimens, 
and sections from 36 normal and 800 
metastatic tissues) and a normal tissue 
array (76 tissue and 332 cell types). 

The problem with proteins 
It has been said that genomics tells us 
what might happen, transcriptomics 
indicates what should happen, and pro- 
teomics shows what is happening. The 
impact of functional proteomics on 
pharmaceutical R&D is rapidly increas- 
ing, and protein arrays are being used 
increasingly in both basic and applied 
research. Their use lies not only in com- 
parative protein expression and inter- 
action profiling, but also in diagnostics 
and drug discovery. However, an in- 
creasing number of researchers have 
found that protein arrays, like their 
cousins the DNA arrays, present several 
practical obstacles relating to their pro- 
duct! n and use. For example, in using 
Escherichia coli to produce recombi- 
nant eukaryotic proteins from a single 
expression vector, multiple protein 
products are often produced, suggest- 
ing mixes of truncated r therwise 
altered pr teins. There is als the obvi- 
ous c ncern that the proteins might 
not be modified in a similar manner t 



eukaryotic systems. Also, an optimal 
method for depositing and binding 
proteins to the selected substrate is 
yet to be determined, as is the best 
way to ensure that they are bound in a 
correctly folded, active conformation. 

Several companies have been address- 
ing these problems. Prolinx (http:// 
www.proIinxinc.com) is one such com- 
pany, and Karin Hughes described their 
Versalinx™ chemistry for producing 
protein, peptide and small-molecule 
arrays. Versalinx™ uses solution-phase 
conjugation followed by immobiliza- 
tion, resulting in functional orientation 
of proteins and peptides on the sub- 
strate surface. It also offers the valuable 
additional benefit of exhibiting low 
non-specific binding. Sense Proteomic 
(http://www.senseproteomic.com) is 
also among those addressing these 
problems to develop robust protein 
arrays for drug discovery and clinical 
applications and has developed func- 
tional protein array formats based on 
specific disease tissues. Subtractive hy- 
bridization is used to identify genes 
with altered expression in breast tumor 
and cystic fibrosis compared to normal 
tissue. A high throughput cloning strat- 
egy (COVET™) is then used to produce 
libraries of genes that are tagged, 
cloned, expressed, purified and finally 
immobilized on glass slides. Initial vali- 
dation studies have shown that the vast 
majority of the immobilized proteins do 
indeed retain biological function. 

Stefan Schmidt and his company 
(GPC Biotech; http://www.gpcbiotech. 
de) have moved past the platform devel- 
opment stage and, with their focus 
firmly on drug discovery, are currently 
developing kinase-profiling arrays. 
Kinases are important targets for phar- 
maceutical drug discovery and therapy, 
and CPC's aim is to simultaneously de- 
tect multiple kinases, obtain activity pro- 
files for different cell types, or analyze 
the ability of drug candidates to inhibit 
kinase activity. To do this, recombinant 
kinase substrates are immobilized on 



membranes, incubated with purified 
kinase, and the-substrates measured for 
the degree of phosphorylation. 

Summary 

Meetings like this, packed with exciting 
discoveries and intriguing and interest- 
ing innovation, heavily emphasize the 
pace at which biotechnology is advanc- 
ing, to the extent that the number of 
options for genomic and proteomic re- 
searchers can become overwhelming. 
Although data analysis is perhaps the 
greatest current concern for array users, 
an increasing challenge will be to deter- 
mine the approaches and technology 
that really work, and to do it in a timely 
manner. 
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A standard two-dimensional (2-D) protein map of Fischer 344 rat liver 
(F344MST3) is presented, with a tabular listing of more than 1200 protein species 
Sodium dodecyl sulfate (SDS) molecularmass and isoelectric point have been es- 
tablished, based on positions of numerous internal standards. This map has been 
used to connect and compare hundreds of 2-D gels of rat liver samples from a va- 
riety cf studies, and forms the nucleus of an expanding database describing rat 
liver proteins and their regulation by various drugs and toxic agents. An example 
of such a study, involving regulation of cholesterol synthesis by cholesterol-lower- 
ing drugs anc & high-cholesterol diet, is presented. Since the map has been ob- 
tained with a wicely used and highly reproducible 2-D gel system (the Iso-DaJt* 
system), it can be directly related to an expanding body of work in other laborato- 
ries. 
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1 Introduction 

High-resolution two-dimensional electrophoresis of pro- 
teins, introduced in 1975 by O'Farrell and others [1-4], has 
been used over the ensuing 16 years to examine a wide va- 
riety of biological systems, the results appearing in more 
than f 000 published papers. With the advent of computer- 
ized systems for analyzing two-dimensional (2-D) gel ima- 
ges and constructing spot databases, it is also possible to 
plan and assemble integrated bodies of information de- 
scribing the appearance and regulation of thousands of pro- 
tein gene products [5, 6]. Creating such databases involves 
amassing and organizing quantitative data from thousands 
of 2-D gels, and requires a substantial commitment in tech- 
nology and resources. 

Given the long-term effort required to develop a protein da- 
tabase, the choice of a biological system takes on consider- 
able importance. While in vitro systems are ideal for answer- 
ing many experimental questions, especially in cancer re- 
search and genetics, our experience with cell cultures and 
tissue samples suggests that some in vivo approaches could 
have major advantages. In particular, we have noticed that 
liver tissue samples from rats and mice appearto show grea- 
ter quantitative reproducibility (in terms of individual pro- 
tein expression) than replicate cell cultures. This is perhaps 
a natural result of the homeostasis maintained in a com- 
plete animal vs. the well-known variability of cell cultures, 
the latter due principally to differences in reagents (e.g.] 
fetal bovine serum), conditions (e.g., pH) and genetic "evo^ 
luiion" of cell lines while in culture. It is also more difficult 
to generate adequate amounts of protein from cell culture 
systems (particularly with attached cells), forcing the inves- 
tigator to resort to radioisotope-based or silver-based stain- 
detection methods. While these methods are more sensi- 
tive (sometimes much more sensitive) than the Coomassie 
Brilliant Blue (CBB) stain typically used for protein detec- 
tion in "large" protein samples, they are generally more vari- 
able, more labor-intensive and, in the case of radiographic 
methods, may generate highly M noisy w images, due to the 
properties of the films used. By contrast, large protein sam- 
ples can easily be prepared from liver using urea/Nonidet 
P-40 (NP-40) solubilization and stained with CBB, which 
has the advantage of being easily reproducible [8]. Finally, 
there remains the question of the "truthfulness* of many in 
vino systems as compared to their in vivo analogs; how 
great are the changes caused by the introduction into a cul- 
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ture and the associated shift to strong selection for growth, 
and how do these affect experimental outcomes? Hence 
tbe apparent advantages of in vitro systems, in terms of ex- 
perimental manipulation, may be counterbalanced by 
other factors relating to 2-D data quality. 

There is a second important class of reasons for exploring 
the use of an in vivo biological system such as tbe liver. His- 
torically there have been two broad approaches to the me- 
chanistic dissection of biochemical processes in intact cel- 
lular systems: genetics (a search for informative mutants) 
and the use of chemical agents (drugs and chemical toxins). 
Both approaches help us to understand complex systems 
by disrupting some specific functional element and show- 
ing us the result. With the development of techniques for 
genetic manipulation and cloning, the genetic approach 
can be effectively applied either in vitro or in vivo, although 
the in vitro route is usually quicker. The chemical epproach 
can also be applied to eithersort of biological system; here, 
however the bulk of consistently acquired information is 
in experimental animals (rats and mice). While most biolo- 
gists knowa short list of compounds having specific, experi- 
mentally useful effects (e.g., inhibitors of protein synthesis, 
ionophores, polymerase inhibitors, channel blockers, nu- 
cleotide analogs, and compounds affecting polymerization 
of cytoskeletal proteins), there is a much larger number of 
interesting chemically-induced effects, most of them char- 
acterized by toxicologists and pharmacologists in rodent 
systems. Just as a thorough genetic analysis would involve 
saturating a genome with mutations, it is possible to ima- 
gine a saturating number of drugs, the analysis of whose ac- 
tions would reveal the complete biochemistry of the cell. 
While organized drug discovery efforts usually target spe- 
cific desired effects, the nature of the process, with its de- 
pendence on screening large numbers of compounds, ne- 
cessarily produces many unanticipated effects, h is there- 
fore reasonable to suppose that the required broad range of 
compounds necessary to achieve "biochemical saturation" 
may be forthcoming; in fact, it may already exist among the 
hundreds of thousands of compounds that failed to qualify 
as drugs. 

Among organs, the liver is an obvious choice for the study 
of chemical effects because of its well-known plasticity and 
responsiveness. The brain appears to be quite plastic {e.g. 
[7]), but it is a complicated mixture of cell types requiring 
skillful dissection for most experiments. The kidney, while 
quite responsive, also presents a potentially confounding 
mature of cell types. The liver, by contrast, is made up of 
one predominant cell type which is easy to solubilize: the 
hepatocyte, representing more than 95% of its mass. Most 
importantly, the liver performs many homeostatic func- 
tions that require rapid modulation of gene expression. It 
appears that most chemical agents tested affect gene ex- 
pression in the liver at some dosage (N. Leigh Anderson, 
unpublished observations), an interesting contrast to our 
earlier work with lymphocytes, for example, which seem to 
be much less responsive. Such results conform to the expec- 
tation that cells with a homeostatic, physiological role 
should be more plastic than cells differentiated for a pur- 
pose dependent on tbe action of a limited number of spe- 
cific genes. 

The liver also allows the parallels between in vitro and in 
vivo systems to be examined in detail. Significant progress 



has been made in the development of mouse, rat and hu- 
man hepatocyte culture systemsTSS well as in precision-cut 
tissue slices. Using such an array of techniques, it is possi- 
ble to assemble a matrix of mammalian systems including 
mouse and rat in vivo on one level and mouse, rat and hu- 
man in vitro on a second level, and to compare effects be- 
tween species and between systems. This approach allows 
us to draw informed conclusions regarding the biochemical 
"universality* of biological responses among the mammals, 
and to offer some insight into the validity of in vitro ap- 
proaches for toxicological screening. We believe this data 
will be necessary if in vitro alternatives are to achieve wide 
usage in government-mandated safety testing of drugs, con- 
sumer products and industrial and agricultural chemicals. 

A number of interesting studies have been published using 
2-D mapping to examine effects in the rodent liver. A num- 
ber of investigarors have made use of the technique to 
screen for existing genetic variants (8—1 1] or induced muta- 
tions [12-14], mainly in the mouse. This work builds on the 
wealth of genetic information available on the mouse and 
its established position as a mammalian mutation-detec- 
tion system. While some studies of chemical effects have 
been undertaken in the mouse [15-17], most have used the 
rat [18-23]. The examination of the cytochrome p-450 sys- 
tem, in particular, has been carried out almost exclusively 
on the rat [24, 25]. 

These considerations lead us to conclude that rodent liver 
offers the best opportunity to systematically examine an 
array of gene regulation systems, and ultimately to build a 
predictive model of large-scale mammalian gene control. 
The basic underlying foundation of such a project is a reli- 
able, reproducible master 2-D pattern of liver, to which on- 
going experimental results can be referred. In this paper, we 
report such a master pattern for the acidic and neutral pro- 
teins of rat liver (pattern F344MST3).ln future, this master 
will be supplemented by maps of basic proteins, and analog- 
ous maps of mouse and human liver. 



2 Materials and methods 
2.1 Sample preparation 

Liver is an ideal sample material for most biochemical stud- 
ies, including 2-D analysis. A sample is taken of approxima- 
tely 0.5 g of tissue from the apical end of the left lobe of the 
liver. Solubilization is effected as rapidly as practical; a 
delay of 5—15 min appears to cause no major alteration in 
liver protein composition if the liveT pieces are kept cold 
(e.g., on ice) in the interim. In the solubilization process, 
the liver sample is weighed, placed in a glass homogenizer 
(e.g., 15 mL Wheaton); 8 volumes of solubilizing solution* 

• The solubilizing solution is composed of 2 % NP-40 (Sigma), 9 m urea 
(analytical grade, e.g., BDH or Bio-Rad), 0.5% dithiothreiiol {DTT; 
Sigma) and 2 % earner ampholytes (pH 9-1 1 LKB: these come u a 20% 
stock solution, so 2 % final concentration is achieved by making the final 
solution 10% 9-11 Ampholine by volume). A large batch ofsolubilizer 
(several hundred mL) is made and stored froien at-80°C in aliquots 
sufficient to provide enough for one day's estimated sample prepara- 
tion requirement. The solution is never allowed to become wanner 
than room temperature at any stage during preparation or thawing for 
use. since heating of concentrated urea solutions can produce cooumi- 
nants that covalently modify proteins producing anifactuil charge 
shifts. Once thawed, any unused solubilizer is discarded. 
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- isaddcd {Le., 4 mL per OJ g tissuej.antf the mixture is ho- 
mogenized using first the loose- and then ihen ihe tighi-fit- 
ling glass pestle. This takes approximately 5 strokes with 
each pestle and is carried out at room temperature because 
urea would crystallize out in the cold. Once the liver sample 
is thoroughly homogenized in the solubilizer, it is assumed 
that all the proteins are denatured (by the chaouopic effect 
of the urea and NP-40 detergent) and the enzymes inacti- 
vated by the high pH (-9.5). Therefore these samples may 
be kept at room temperature until they can be centrifuged 
or frozen as a group (within several hours of preparation). 
The samples are centrifuged for 6 X 10* g min (e.g., 500 000 
X g for 12 min using a Beckman TL-100 centrifuge). The 
centrifuge rotor is maintained at just below rocm tempera- 
ture (e.g., 15-20°C), but not too cold, so 2s to prevent the 
precipitation of urea.The centrifuge of choice is a Beckman 
TL-100 because of the sample tube sizes available, but any 
ultracentrifuge accepting smallish tubes wjjj suffice. When 
an appropriate centrifuge is not available near the site of 
sample preparation, samples can be frozen at -EOT and 
thawed prior to centrifugation and collection of superna- 
tants. Each supernatant is carefully removed following cen- 
trifugation and aliquoted into at least 4 clean tubes for stor- 
age. This is done by transferring all the supernatant to one 
clean tube, mixing this gently (to assure homogeneous 
composition) and then dividing it into A aliquois. The aJi- 
quots are frozen immediately at -80°C. These multiple ali- 
quots can provide insurance against a failed run or a freezer 
breakdown. 

22 TVo-dirnensional electrophoresis 

Sample proteins are resolved by 2-D electrophoresis using 
the 20 X 25 cm Iso-DaJt* 2-D gel system ([26-29]; pro- 
duced by LSB and by Hoefer Scientific Instruments', San 
Francisco) operating with 20 gels per batch. All first-dimen- 
sional isoelectric focusing (1EF) gels are prepared using the 
same single standardized batch of carrier ampholytes 
(BDH 4-8A in the present case, selected by LSB's batch- 
testing program for rat and mouse database work"*). A 10 
\lL sample of solubilized liver protein is applied to each gel, 
and the gels are run for 33 000 to 34500 volt-hours using a 
progressively increasing voltage protocol implemented by 
a programmable high-voltage power supply. An Ange- 
lique" computer-controlled gradient-casting system (pro- 
duced by LSB) is used to prepare second-dimensional sod- 
ium dodecyl sulfate (SDS) polyacrylamide gradient slab 
gels in which the top 5% of the gel is 1 1 %Tacrylamide, and 
the 1 wer 95% of the gel varies linearly from 11 % to 18%T. 

This system has recently been modified so as to employ a 
commercially available 30.8%T acrylamide/A^-methyle- 
nebisacrylamide prepared solution (thus avoiding the han- 
dling of the solid acrylamide monomer) and three addi- 
tional stock solutions: buffer (made from Sigma pre-set 
Tris), persulfate and AWA^-tetramethyJethvlenedi- 
amine (TEMED). Each, gel is identified by a computer- 
printed filter paper label polymerized into the lower left cor- 
ner of the gel. First-dimensional JEF tube gels are loaded 

This material (succeeding certified batches ofwhich are available from 
Hoefer Scientific Instruments) has the most linear pH gradient pro- 
duced by any ampholyte tested except for the Pharmacia wide range 
(which has an unacceptable tendency to bind high-molecular weight 
acidic proteins, causing them to streak). 
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cirectly (as extmded) onto the slab gels without equilibra- 
tion and held in place by polyester fabric wedges (Wed- 
gies produced by LSB) to avoid the use of hot agarose 
Second-dimensionaJ slab gels are run overnight, in groups 
of 20, in cooled DALT tanks (WQ with buffer circulation. 
AH run parameters, reagent source and lot information, 
and notations of deviation from expected results are ente- 
red by the technician responsible on a detailed, multi-page 
record of the experiment. 

23 Staining 

Following SDS-electrophoresis, slab gels are stained for 
protein using a colloidal Coomassie Blue G-250 procedure 
in covered plastic boxes, with 10 gels (totalling approxima- 
tely l L of gel) per box. This procedure (based on the work 
of NeuhofT[30,3]]) involves fixation in 1.5 L of 50% etha- 
nol and 2% phosphoric acid for 2h, three 30 min washes, 
each in 2 L of cold tap water, and transfer to 1.5 L of 34% 
methanol, 17% ammonium sulfate and 2% phosphoric acid 
for 1 h, followed by the addition of a gram of powdered Coo- 
massie Blue G-250 stain. Staining requires approximately 4 
cays to reach equilibrium intensity, whereupon eels are 
transferred to cool lap water and their surfaces rinsed to re- 
move any particulate stain prior to scanning. Gels may be 
kept for several months in water with added sodium azide 
The water washes remove ethanol that would dissolve the 
siam (and render the system noncolloidal, with high back- 
grounds). The concentrated ammonium sulfate and meth- 
anol solution is diluted by equilibration with the water vol- 
ume of the gels to automatically achieve the correct final 
concentrations for colloidal staining. Practical advantages 
of this staining approach can be summarized as follows- (i) 
the low, flat background makes computer evaluation of 
small spots (max OD < 0.02) possible, especially when 
using laser densitometry; (ii) up to 1500 spots can be reli- 
ably detected on many gels (e.g., rat liver) at loadings low 
enough to preserve excellent resolution; and (iii) reprodu- 
cibility appears to be very good: at least several hundred 
spots have coefficients of reproducibility less than 15% 
This value is at least as good as previous CBB methods, and 
significantly better than many silver stain systems. 

2.4 Positional standardization 

The carbamylated rabbit muscle creatine phosphokinase 
(CPK) standards [32] are purchased from Pharmacia and 
BDH. Amino acid compositions, and numbers of residues 
present in proteins used for internal standardization, are 
taken from the Protein Identification Resource (PIR) se- 
quence database [33]. 

2.5 Computer analysts 

Stained slab gels are digitized in red light at 134 micron re- 
solution, using either a Molecular Dynamics laser scanner 
(with pixel sampling) or an Eikonix 78/99 CCD scanner. 
Raw digitized gel images are archived on high-density DAT 
tape (or equivalent storage media) and a greyscale video- 
print prepared from the raw digital image as hard-copy 
backup of the gel image. Gels are processed using the Kep- 
ler* software system (produced by LSB), a commercially 
available workstation-based software package built on 
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"some of the principles oTthc earhYrTYCHO'system [34- 
41]. Procedure PROC008 is used to yield a spotlisi giving 
position, shape and density information for each detected 
spot. This procedure makes use of digital filtering, mathe- 
matical morphology techniques and digital masking to re- 
move thebackground.and uses full 2-D least-squares opti- 
mization to refine the parameters of a 2-D Gaussian shape 
for each spot. Processing parameters and file locations arc 
stored in a relational database, while various log files detail- 
ing operation of the automatic analysis software are ar- 
chived with the reduced cata.The computed resolution and 
level of Gaussian convergence of each gel are inspected 
and archived for quality control purposes. 

Experiment packages are constructed using the Kepler ex- 
periment definition database to assemble groups of 2-D 
patterns corresponding to the experimental groups (e.g., 
treated and control animals). Each 2-D pattern is matched 
to the appropriate "master" 2-D pattern (pattern 
F344MST3 in the case of Fischer 344 rat liver), thereby 
providing linkage to the existing rodent protein 2-D data- 
bases. The software allows experiments containing hun- 
dreds of gels to be constructed and analyzed as a unit, with 
up to 100 gels displayed on the screen at one time for com- 
parative purposes and multiple pages to accommodate ex- 
periments of > 1000 gels. For each treatment, proteins 
showing significant quantitative differences vs. appropriate 
controls are selected using group-wise statistical parame- 
ters (e.g., Student's t-test, Kepler* procedure STUDENT). 
Proteins satisfying various quantitative criteria (such as P< 
0.001 difference from appropriate controls) are repre- 
sented as highlighted spots onscreen or on computer-plot- 
ted protein maps and stored as spot populations (/.*., logi- 
cal vectors) in a liver protein database. Quantitative data 
(spot parameters, statistical or other computed values) are 
stored as real-valued vectors in the database. Analysis of co- 
regulation is performed using a Pierson product-moment 
correlation (Kepler procedure CORREL) to determine 
whether groups of proteins are coordinately regulated by 
any of the treatments. Such groups can be presented graphi- 
cally on a protein map,and reported together with the statis- 
tical criteria used to assess the level of coregulation. Multi- 
variate statistical analysis (e.g., principal components* ana- 
lysis) is performed on data exported to SAS (SAS Institute). 

2.6 Graphical dau output 

Graphical results are prepared in GKS and translated 
within Kepler* into output for any of a variety of devices. 
Linedrawing output is typically prepared as Postscript and 
printed on an Apple LaserWriter. Detailed maps presented 
here have been generated using an ultra-high-resolution 
Postscript-compatible Linotronic output device. GreyscaJe 
graphics are reproduced from the workstation screen using 
a Seikosha videoprinter. Patterns are shown in the standard 
orientation, with high molecular mass at the top and acidic 
proteins to the left. 

2.7 Experiment LSBC04 

In the study described here 12-week-oId Charles River 
male F344 rats were used. Diets were prepared at LSB, 
based on a Purina 5755M Basal Purified Diet. Lovastatin 
and cholestyramine were obtained as prescription pharma- 



ceuticaJs;gfound and mixed with the diet at concentrations 
of 0.075 % and 1%, respectiveiyrThe high cholester 1 diet 
was Purina 5801M-A (3% cholesterol plus 1% sodium cho- 
late in the control diet). Animal work was carried out by Mi- 
crobiological Associates (Bethesda,MD). Animals were ac- 
climatized for one week on the control diet, fed test or con* 
trol diets for one week, and sacrificed on day 8. Average 
daily doses of lovastatin and cholestyramine in appropriate 
groups were 37 mg/kg/day and 5 g/kg/day, respectively, 
based on the weight of the food consumed. Liver samples 
were collected and prepared for 2-D electrophoresis accord- 
ing to the standard liver protocol (homogenization in 8 
volumes of 9 m urea, 2% NP-40, 0.5% dithiothreitol, 2% 
LKB pH 9-11 carrier ampholytes, followed by centrifuga- 
tion for 30 min at 80000 X g). Kidney, brain and plasma 
samples were frozen. Gels were run as described above, 
and the data was analyzed using the Kepler* system. Gels 
were scaled, to remove the effect of differences in protein 
loading, by setting the summed abundances of a large num- 
ber of matched spots equal for each gel (linear scaling). 

3 Results and discussion 

3.1 The rat Liver protein 2-D map 

F344MST3 is a standard 2-D pattern of rat liver proteins, 
based on the Fischer 344 strain. This pattern was initiated 
from a single 2-D gel and extensively edited in an experi- 
ment comparing it to a range of protein loads, so as to in- 
clude both small spots and well-resolved representations of 
high-abundance spots. More than 700 rat liver 2-D patterns 
have been matched to F344MST3 in a series of drug effects 
and protein characterization experiments, and numerous 
new spots (induced by specific drugs, for instance) have 
been added as a result. A modified version including addi- 
tional spots present in the Sprague-Dawley outbred rat has 
also been developed (data not shown). Figure 1 shows a 
greyscale representation and Fig. 2 a schematic plot of the 
master pattern. More than 1200 spots are included, most of 
which are visible on typical gels loaded with 10 uLof solubi- 
lized liver protein prepared by the standard method and 
stained with colloidal Coomassie Blue. Master spot num- 
bers (MSN's) have been assigned to all proteins, and ap- 
pear in the following figures, each showing one quadrant of 
the pattern. Figure 3 shows the upper left (acidic, high 
molecular mass) quadrant, Fig. 4 the upper right (basic, 
high molecular mass) quadrant, Fig. 5 the lower left (acidic, 
low molecular mass) quadrant, and Fig. 6 the lower right 
(basic, low molecular mass) quadrant. The quadrants over- 
lap as an aid to moving between them. The gel position (in 
100 micron units), isoelectric point (relative to the CPK in- 
ternal pi standards) and SDS molecularmass (from the cali- 
bration curve in Fig. 8) are listed for each spot (Table 1). Be- 
cause of the precision of the CPK-p/ values, these parame- 
ters can be used to relate spot locations between gel sys- 
tems more reliably than using p/ measurements expressed 
as pH. A major objective of current studies is the identifica- 
tion of all major spots corresponding to known liver pro- 
teins, as well as rigorous definitions of subcellular orga- 
nelle contents. Of particular interest to us is the parallel de- 
velopment of identifications in the rat and mouse liver 
maps, allowing detailed comparisons of gene expression ef- 
fects in the two systems. The results of these studies will be 
presented systematically in a later edition of this database, 
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but we include here a useful series of 22 orienting identifi- 
cations as an aid to other users of the rat liver pattern (Table 
2). 



^coordinate, a is 51 1.83, b is -=02731 and c is 33183801 Ibc 
resulting fit appears to be fairly good over a broad range of 
molecular mass. 



3 2 Carbamylsied charge standards, computed p/s and 
molecular mass standardization 

We have previously shown that the use of a svstem of close- 
ly-spaced internal pi markers (made by carbamvjatin/a 
basic protein) ofTers aji accurate and workable solution to 
the problem or assigning positions in the pi dimension [32]. 
The same system, based on 36 protein species made by car- 
bamylating rabbit muscle CPK, has been used here to as- 
sign p/s to most rat liver acidic and neutral proteins. The 
standards were coelectrophoresed with total liver proteins, 
and the standard spots added to a special version of the 
master pattern F344MST3. The gel ^-coordinates of all 
liver protein spots lying within the CPK charge train were 
then transformed into CPK pi positions by interpolation 
between the positions of immediately adjacent standards 
(Table 1) using a Kepler* vector procedure. 

It has proven possible to compute fairly accurate pi values 
for many proteins from the amino acid composition [42]. 
We have attempted here to test a further elaboration of this 
approach,in which we computed p/s for the CPK standards 
themselves, based on our knowledge of the rabbit muscle 
CPK sequence and the fact thai adjacent members of the 
charge train typically differ by blockage of one additional ly- 
sine residue (Table 3). We compared these values to similar 
computed pPs for an additional set of carbamylated stand- 
ards made from human hemoglobin beta chains and a se- 
ries of rat liver and human plasma proteins of known posi- 
tion and sequence (Fig. 7,Table 4).The result demonstrates 
good concordance between these systems. Two proteins 
show significant deviations: liver fatty-acid binding protein 
(FABP; #1 in Table 4) and protein disulphide isomerase 
(*20 in the table). The FABP spot present on F344MST3 
may represent a charge-modified version of a more basic 
parent spot closer to the expected p/, not resolved in the 
IEF/SDS gel. Of particular importance is the fact that, by 
comparing computed p/s ofsequenced but unlocated pro- 
teins with the CPK p/s, we can assign a probable gel loca- 
tion without making any assumptions regarding the actual 
gel pH gradient. This ofTers a useful shortcut, given the va- 
garies f pH measurement on small diameter JEF gels. We 
have used this approach to compute the CPK p/s of all rat 
and mouse proteins in the P1R sequence database, as an aid 
to protein identification (data not shown). 

In order to standardize SDS molecular weight (SDS-MW), 
we have used a standard curve fitted to a series of identified 
proteins (Fig. 8). Rather than using molecular mass per se, 
we have elected to use the number of amino acids in the 
polypeptide chain, as perhaps a better indication of the 
length of the SDS-coated rod that is sieved by the second 
dimension slab. The resulting values were multiplied by 
1 12 (the weighted average mass of amino acids in se- 
quenced proteins) to give predicted molecular masses. Be- 
cause we use gradient slabs, we have not constrained the fit- 
ted curve to conform to any predetermined model; rather 
we tried many equations and selected the best using the 
program "Tablecune^on a PC. The equation chosen was> 
=a + fcr+c/x\ where vis the number of residues,* is the gel 



3 J An example of rat liver gene regulation: Cholesterol 
metabolism 

Experiment LSBC04 was designed as a small-scale test of 
the regulation of cholesterol metabolism in vivo by three 
agents included in the diet: lovastatin (Mevacor*. an inhibi- 
tor of HMG-CoA reductase); cholestyramine (a bile acid 
sequestrant that has the effect of removing cholesterol 
from the gut-liver recirculation); and cholesterol itself. The 
first two agents should lower available cholesterol and the 
third should raise it, allowing manipulation of relevant 
gene expression control systems in both directions. Such 
an experiment ofTers an interesting test of the 2-D mapping 
system since most of the pathway enzymes are present in 
low abundance, many are membrane-bound and difficult 
to solubiiize, and the pathway itself is complex. Approxima- 
tely 1000 proteins were separated and detected in liver ho- 
mogenates. Twenty-one proteins were found to be affected 
by at least one treatment, and these could be divided into 
several coregulated groups. 



3.3.1 MSN 413 (putative cytosolic HMG-CoA synthase) 
and sets of spots regulated coordinated or inversely 

One group of spots (including a spot assigned to the cyto- 
solic HMG-CoA synthase, MSN 4 13) showed the expected 
increase in abundance with lovastatin or cholestyramine 
the synergistic further increase with lovastatin and choles- 
tyramine, and a dramatic decrease with the high cholesterol 
diet. Spot number 413 is the most strongly regulated pro- 
tein in the present experiment, showing a 5- to 10-fold in- 
duction aftera 1 week treatment with 0.075% lovastatin and 
3 % cholestyramine in the diet (Figs. 9 and 10). Its expres- 
sion follows precisely the expectation for an enzyme whose 
abundance is controlled by the cholesterol level; it is pro- 
gressively increased from the control levels by cholestyra- 
mine, lovastatin and lovastatin plus cholestyramine, and it 
sinks below the threshold of detection in animals fed the 
high cholesterol diet. This spot has been tentatively identi- 
fied as the cytosolic HMG-CoA synthase, based on a reac- 
tion with an antiserum to that protein provided by Dr Mi- 
chael Greenspan at Merck Sharp & Dohme Research Labo- 
ratories. This enzyme lies immediately before HMG-CoA 
reductase in the liver cholesterol biosynthesis pathway and 
is known to be co-regulated with it. Spot 413 has an SDS 
molecularweight of about 54 000 and a CPK p/of-11.4,in 
reasonably close agreement with a molecular weight* of 
57300 and a CPK p/of-15.7 computed from the known se- 
quence of the hamster enzyme [43]. 

Using a classical product-moment correlation test (Kepler 
procedure CORREL), a series of five additional spots was 
found to be coregulated with 413. The level of correlation 
was exceedingly high p> 95%).Two of these, 1250and 933, 
are at similar molecular weights and approximately one* 
charge more acidic than 413 (Fig. 9), indicating that they 
may be covalently modified forms of the 413 polypeptide. 
This suspicion is strengthened by the observation that both 
spots are also stained by the antibody to cytosolic HMG- 
CoA synthase. The remaining three correlated spots appear 
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to comprise an additional related pair (1253 and 1001) of 
around 40 kDa and a single spot (1119) of around 28 kDa. 
Because these two presumed proteins are present at sub- 
stantially lower abundances than 413, and because the cyto- 
solic HMG-CoA synthase is reported to consist of only one 
tvpe of polypeptide, they are likely to represent other, very 
tightly coregulated enzymes. A second group of six spots 
was selected based on a regulatory pattern close to the in- 
verse of that for spot 413 (MSN's 34,79, 178, 182,204,347; 
data not shown). For these proteins, the lowest level of ex- 
pression occurs with exposure to lovastatin plus cholestyra- 
mine and the highest level upon exposure to the high-cho- 
lesterol diet. Spots 182 and 79 are highly correlated and lie 
about one charge apart at the same molecular weight; they 
may thus be isoforms of a single protein. The other four 
spots probably represent additional enzymes or subunits. 

3.3.2 MSN 235 and coregulated spots 

A third group of five spots, mainly comprised of mitochon- 
drial proteins including putative mitochondrial HMG- 
CoA synthase spots, showed a modest induction by lovasta- 
tin alone, but little or no effect with any of the other treat- 
ments (including the combination of lovastatin and choles- 
tyramine; Fig. 12).This result is intriguing because lovasta- 
tin was expected to affect only the regulation of enzymes of 
cholesterol synthesis, which is entirely extra-mitochon- 
dria). Three of the spots (235, 134, 144) form a closely- 
packed triad at approximately 30 kDa, and are likely to re- 
present isoforms of one protein. All three spots are stained 
by an antibody to the mitochondrial form of HMG-CoA 
synthase obtained from Dr. Greenspan. Subcellular fractio- 
nation indicates a mitochondrial location. The other two 
spots (633 at about 38 kDa and 724 at about 69 kDa) are 
each present at lower abundance than the members of the 
triad. 

333 An example of an anti-synergistic effect 

A sixth spot (367) shows strong induction by lovastatin 
(two- to threefold), and about half as much induction with 
lovastatin plus cholestyramine, but without sharing the ani- 
mal-animal heterogeneity pattern of the 235-set (Fig. 13). 
This protein is also mitochondrial, and represents the clear- 
est example of an anti-synergistic effect of lovastatin and 
cholestyramine. The existence of such an effect demon- 
strates that lovastatin and cholestyramine do not act exclu- 
sively through the same regulatory pathway. 

33.4 Complexity of the cholesterol synthesis pathway 

Taken together, these results suggest that treatment with lo- 
vastatin alone can affect both cytosolic and mitochondrial 
pathways using HMG-CoA, while cholestyramine, on the 
other hand, either alone or in combination with lovastatin, 
produces a strong effect n the putative cytosolic pathway, 
but little or no effect on the putative mitochondrial path- 
way. An explanation for this difference may lie in lovasta- 
tin's effect on levels of HMG-CoA and related precursor 
compounds that are exchanged between the cytosol and 
the mitochondrion, whereas cholestyramine should affect 
only the cytosolic pathways directly controlled by cholester- 
ol and bile acid levels. It remains to be explained why some 



proteins of the putative mitochondrial pathway are so 
much more variable in their expression in all groups. An ex- 
amination of all the coregulated groups suggests that quan- 
titative statistical techniques can extract a wealth of inter* 
esting information from large sets of reproducible gels.Tbe 
abundance of spots in the 413 coregulation group, for exam- 
ple, shows an amazing level of concordance in their relative 
expression among the five individuals of the lovastatin and 
cholestyramine treatment group. This effect is not due to 
differences in total protein loading, since they have already 
been removed by scaling, and since proteins with quite dif- 
ferent regulation patterns can be demonstrated (e.g.. Fig. 
13). Such effects raise the possibility that many gene coregu- 
lation sets may be revealed through the study of a suffi- 
ciently large population of control animals (/.e., without 
any experimental manipulation). This approach, exploiting 
natural biological variation in protein expression instead of 
drug effects, offers an important incentive for the construc- 
tion of a large library of control animal patterns. 



4 Conclusions 

Because of the widespread use of rat liver in both basic bio- 
chemistry and in toxicology, there is a long-term need fora 
comprehensive database of liver proteins. The rat liver mas- 
ter pattern presented here has proven to be an accurate re- 
presentation of this system, having been matched to more 
than 700 gels to date. As the number of proteins identified 
and the number of compounds tested for geDe expression 
efTects grows, we expect this database to contribute valu- 
able insights into gene regulation, lis practical utility in sev- 
eral areas of mechanistic toxicology is already being de- 
monstrated. 

Received September 11, 1991 
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/i* w/r Synthetic representation of the standard rat liver 2-D master pattern, rendered as a greyscale image using a videoprinter. 
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Figurt 3. Upper left (high molecular weight, acidic) quadrant (#1) of the rat liver map, showing spot numbers. 
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Gel Y Coordinate 
r«y« 5. Plot of number of amino acids versus gel ^-position, with fmed 
curve used to predict molecular mass of unidentified proteins 



CPK position 



« /few 7. (a) Plot of computed isoelectric point versus gel ^-position for 
two sets of carbamylated standard proteins (rabbit muscle CPK M «d 
?ZTh herao * b « farmed diamonds) and several other proems 
(shaded squares), (b) The .dentities of the various proteins represented 
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f^i/rf 9. Montage showing effects in the 
region ofMSN:413.The montage show, a 
small window into one portion of the 2-D 
pattern, one row of windows for each expe- 
rimental group, and one panel for each gel 
in the experiment. The left-most pattern 
in each row is a group-specific copy of the 
master pattern followed by the patterns 
for the five individual rats in the group. 
The highlighted protein spots (filled circ- 
les) are spot 413 (on the right of each pan- 
el; identified as cytosolic HMG-CoA syn- 
thase) and two modified forms of it (1250 
and 933). From the top, the rows (experi- 
mental groups) are: high cholesterol, con- 
trols, cholestyramine, lovastaiin.and lova- 
statin plus cholestyramine. 
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Figure iQ. Bargraph showing the quantita- 
tive effects of various treatments on the 
abundance of MSN:413 (cytosolic HMG- 
CoA synthase) in the gels of Fig. 9. 




Figure I J. Bargraphs of a series of six core* 
gulated spots including MSN:413. In the 
bargraphs, the abundances of the appro- 
priate spot (master spot number shown at 
the top of the panel) in each animal ire 
shown. The five five-animal groups are in 
the order (left to right): high cholesterol, 
controls, cholestyramine, lovastatin, and 
lovastatin plus cholestyramine. Each bar 
within a group represents one experimen- 
tal animal liver (one 2-D gel). Note the cor- 
related expression of the 6 spots, espe- 
cially in the two far right (most strongly in- 



duced) groups. 
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Figure 12, Data on a second coregultled 
group of spots, presented as in Fig. 11. The 
fourth experimental group (lovastatin) 
shows a modest induction, while the fifth 
group (lovastatin plus cholestyramine) 
does not. 




Figure 13. Data on spot MSN:367, presented as in Fig. 11. This protein 
shows unambiguously the anti-synergistic effect of iovastalin and choles- 
tyramine (fifth group) as compared to lovastatin (fourth group).This res- 
ponse contrasts strongly with the regulation pattern seen in Fig. 11. 
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7 Addendum 2: Tables 1-4 

TabJe 1. .Master table of proteins in the-rat livcr-dtubase* 1 



Daubuc of rat Ihrcr proteins 923 



MSN 



Y CPKDl SOSMW 



MSN 



Y CPKDl SOSMW 



3 
5 
8 

11 
15 
17 
18 
19 
20 
21 
22 
23 
24 
25 



311 
568 
812 
548 
645 
629 
906 
755 
649 
1204 
332 
787 
313 
807 



27 1184 

28 1263 



29 
30 



743 
768 



32 1216 

-33 1145 

34 1037 

35 863 



36 
38 
39 
41 
42 



712 
763 
304 
1165 
6B4 



43 1318 

44 1924 



434 

263 

426 

266 

520 

569 

414 

298 

403 

448 

434 

424 

417 

516 

524 

446 

605 

112 

417 

445 

555 

412 

606 

694 

470 

569 

607 

569 

362 



46 


1203 


586 


47 


1391 


447 


48 


309 


454 


49 


605 


567 


50 


621 


535 


51 


1113 


522 


52 


1820 


496 


53 


725 


177 


54 


2001 


500 


55 


722 


830 


56 


678 


533 


57 


1662 


302 


56 


1091 


580 


59 


1171 


585 


60 


1400 


624 


61 


1853 


506 


62 


1688 


567 


65 


735 


297 


66 


1263 


312 


67 


1252 


407 


66 


779 


692 


69 


1064 


296 


71 


656 


589 


72 


638 


545 


73 


1562 


563 


74 


1570 


556 



75 1264 

76 1338 

77 1633 

78 1767 



79 
80 



925 
534 



81 1611 

82 1412 

83 1471 

84 1662 

85 1596 

86 1817 

87 516 

88 1589 

89 1706 



90 
91 



651 
1415 



92 1773 
S3 1338 
94 1708 



621 
564 
363 
565 
738 
698 
363 
681 
347 
563 
479 
301 
1371 
698 
719 
329 
710 
545 
446 
696 



t-35.0 
-24.3 
-16.0 
-25.2 
-15 J 
-21.6 
•14.0 
-17.5 
-20.9 
4.7 
<45.0 
-16.6 
<-35.0 
-16.1 
•9.0 
4.0 
-17.8 
-17.2 
-8.6 
•9.5 
-11 J 
-14.9 
-18.7 
-17 J 
c-35.0 
-6.2 
•19.6 
-7.3 
-0.1 
-8.7 
-6.3 
<-35.0 
•22.5 
-21.8 
-10.0 
•0.8 
•18.3 
>0.0 
•18.4 
-19 A 
-2.5 
•10.3 
-9.2 
-6.2 
-0.6 
-0.4 
-18.1 
4.0 
-8.1 
•16.8 
•10.8 
-20.6 
-21.2 
•3.6 
-3.8 
4.0 
-7.0 
-0.6 
-1.5 
•13.6 
•26.1 
-1.0 
4.0 
•5.0 
•2.7 
•3.4 
-0.9 
•27.0 
•3.5 
•2.2 
-20.6 
4.0 
-1.4 
-7.0 
-2.2 



63.800 
102,900 
64,800 
101,000 
55,200 
50.000 
66,300 
90.200 
67,900 
62.100 
63,800 
65.000 
66.000 
55.500 
54,900 
62.400 
49.000 
348.600 
66,000 
62,500 
52.400 
66.600 
48,900 
43,800 
59,800 
51.400 

4£,eoo 

50,000 
74.600 
50,200 
62.300 
61,500 
50,100 
53.900 
55.000 
57,000 
170.800 
56,900 
37.300 
54,100 
89.000 
50.600 
50,300 
47.800 
56.200 
51,500 
90.500 
65,900 
67.300 
43,900 
90.800 
50.000 
53,100 
50,400 
52.300 
48.000 
51,800 
74,400 
51.700 
41.600 
43.600 
74,500 
44.500 
77.500 
51,800 
58.900 
89.100 
17.400 
43.600 
42.500 
81,700 
43.000 
53.200 
62.300 
43.700 



95 
96 

97 
98 
99 

100 
101 
102 
103 
104 
105 
106 
107 
106 
109 
110 
111 
113 
114 
115 
116 
117 
118 
120 
121 
122 
123 
124 
125 
126 
127 
128 
129 
130 
131 
132 
133 
134 
135 
136 
137 
138 
138 
140 
141 
142 
143 
144 
145 
146 
147 
148 
149 
150 
151 
152 
153 
154 
155 
156 
157 
156 
159 
160 
161 
162 
164 
166 
167 
168 
166 
170 
171 
172 
173 



1119 
1731 
1033 
1406 
578 
2004 
1106 
462 
665 
773 
312 
1769 
1585 
1692 
1482 
778 
1728 
1191 
1298 
682 
1146 
1548 
1050 
1530 
638 
1572 
23 
621 
1298 
672 
1000 
1229 
1422 
1776 
1930 
660 
666 
1271 
1161 
453 
1858 
1504 
1468 
1689 
311 
1366 
1429 
615 
2006 
2006 
1070 
1347 
541 
1645 
1269 
1507 
1722 
932 
1031 
1970 
1258 
1275 
1663 
1034 
1953 
1020 
1566 
1905 
1340 
1506 
1336 
1969 
800 
476 
919 



536 
756 
566 
565 
1149 
538 
623 
455 
630 
1162 
1117 
509 
720 
807 
593 
516 
700 
680 
165 
907 
610 
649 
577 



423 
712 
1433 
1474 
862 
921 
717 
311 
632 
499 
757 
537 
1019 
862 
1389 
1063 
623 
697 
707 
756 
1417 
915 
346 
1017 
566 
518 
1106 
576 
1481 
760 
236 
911 



503 
294 
664 
163 
417 
620 
527 
771 
1482 
806 
565 
181 
583 
678 
541 
378 
958 
1314 



4.9 
-2.0 
-11.4 
4.1 
•23.8 
>0.0 
-10.1 
•26.5 
-20.2 
•17.0 
<35.0 
-1.5 
-3.6 
-2.4 



-16.9 
•2.0 
4.9 
•7.5 
-19.6 
4.5 
-4.1 
-11.1 
-4.3 
-15.4 
-3.8 
<-35.0 
-21.9 
-7.5 
-14.7 
•12.0 
4.4 
4.8 
•1.4 
4.1 
•20.4 
-20.2 
-7.9 
•9.3 
•29.7 
4.6 
-4.6 
-4.8 
-2.4 
<-35.0 
4.7 
-5.7 
•22.1 
>0.0 
>0.0 
-10.7 
4.9 
-25.7 
-2.8 
-7.9 
-4.5 
-2.1 
•13.5 
•11.4 
>0.0 
4.1 
■7.8 
•2.6 
•11.4 
>0.0 
•11.6 
•3.8 
-0.2 
•7.0 
-4.6 
-7.0 
>0.0 
-16.3 
-28.7 
13.7 



53.800 
40.700 
51,600 
51.700 
25,000 
53,700 
47,900 
61.300 
37,300 
23.800 
26,100 
56.100 
42.500 
38.300 
49,700 
55,500 
43.500 
44.500 
160.800 
34.100 
48.700 
36.500 
50.800 
37,400 
65.200 
42.900 
15.30C 
13.900 
36.00C 
33.50C 
42.60G 
86.10C 
37.30C 
57.00C 
40,700 
53,800 
29,700 
36.000 
16.B0C 
28.100 
37,700 
43,700 
43,200 
40,700 
15.800 
33.800 
77,900 
29,800 
51.600 
55.300 
26,500 
50.600 
13.700 
40,500 
117,000 
33.900 
62,100 
56.600 
91,400 
44.400 
162.400 
65,900 
37,800 
54,600 
40,000 
13.700 
38,400 
51,700 
164.900 
50.400 
44.700 
£3.500 
71.800 
32.100 
19.300 



MSN X 

174 1364 

175 825 

177 1582 

178 1321 

179 1069 

180 1866 

181 411 

182 804 

184 1860 

185 1997 

186 279 

187 773 

188 1538 

191 1560 

192 1818 

193 1469 

194 1380 

195 784 

196 1227 

197 667 
196 2006 
199 1711 



Y CPKDl SOSMW 



200 
201 
202 
203 



872 
292 
736 
786 



204 1224 

205 439 

206 1994 

207 1895 

208 240 

210 1700 

211 902 

213 1087 

214 1340 

215 1591 

216 1565 

217 1159 



218 
219 



931 
713 



220 1479 

221 965 
223 934 

225 1812 

226 821 

227 1586 
226 1065 

229 1577 

230 1458 
232 1440 

234 1692 

235 618 



236 
237 



920 
952 



238 1611 

239 1489 

240 501 

241 1820 

242 1357 

243 711 

244 1855 

245 1189 

246 551 

247 1348 

248 460 

249 1733 

250 1974 

251 806 



252 
253 
254 



874 
753 
995 



255 1690 

256 994 



Mastenable of proteins in the rat liver daub.se. showing , po t master number, gel position U and y) 
predicted molecular mass (from the standard curve of Fig. 8). P 1 



257 
258 



508 
1517 



183. 
393 
553 
710 
615 
567 
295 
730 
896 
1017 
1113 
296 
807 
674 
687 
555 
266 
632 
1185 
553 
681 
674 
424 
435 
253 
829 
569 
983 
571 
687 
1418 
490 
517 
6S4 
668 
495 
755 
393 
572 
177 
911 
927 
716 
1045 
411 
1463 
567 
890 
496 
849 
439 
1004 
1138 
1006 
541 
720 
448 
569 
658 
1162 
621 
474 
456 • 
604 
448 
451 
788 
392 
553 
848 
450 
679 
1006 
464 
820 



4.7 
-15.7 
4.6 
-7.2 
-10.4 
4.5 
42.1 
-16.2 
4.6 
>0.0 
<-35.0 
-17.0 
4.2 
4.9 
4.9 
-5.0 
4.4 
-16.7 
4.4 
-20.1 
>0.0 
-2.2 
-14.7 
<-35.0 
-18.0 
-16.7 
4.5 
-30.9 
>0.0 
4.3 
<-35.0 
-2.3 
-14.1 
-10.4 
-7.0 
-3.5 
-3.6 
4.3 
•13.5 
-18.7 
-4.9 
-12.8 
-13.5 
-1.0 
-15.8 
4.6 
-10.8 
4.7 
-5.2 
-5.5 
-2.4 
-22.0 
-13.7 
-13.1 
•3.2 
-4.8 
-27.7 
4.9 
4.8 
•18.7 
4.6 
4.9 
-25.1 
4.9 
•29.3 
•1.9 
>0.0 
-16.1 
•14.6 
-17.6 
-12.1 
•2.4 
-12.1 
-27.4 
-44 



162.900 
69.300 
52.600 
43,000 
48.300 
51,600 
91,200 
42,000 
34,500 
29,800 
26.300 
90.800 
38.400 
44.900 
44,200 
52.400 
101.600 
47,300 
23.700 
52.600 
44.500 
44.900 
65.000 
63,700 

107.800 
37.400 
50.000 
31.100 
51.300 
44,200 
15.800 
57,000 
55.400 
44.400 
45.200 
57.300 
40.700 
69,300 
51.200 

170.500 
33.900 
33,300 
42.700 
28.800 
66.800 
13.600 

51.600 

34,800 

57,300 

36.500 

57.900 

30,300 

25.400 

30.200 

53.500 

42,500 

62.100 

51.400 

45.800 

23.800 

48.000 

59.300 

61,000 

49.100 

62.100 

61.800 

39.200 

69.500 

52.500 

36.500 

61.900 

44.600 

30.200 

60.400 

37,800 



isoelectric point relative to CPK standards, and 
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MSN 



Y CPKol SOSMW 



MSN 



Y CPKcrf SDSMW 



MSN 



Y CPKol SOSMW 



259 
260 
261 
262 
263 
265 
266 
267 
266 
269 
270 
271 
272 
274 
275 
276 
277 
278 
279 
281 
262 
283 
284 
285 
286 
288 
289 
290 
291 
292 
293 
294 
295 
296 
297 
299 
300 
301 
302 
303 
304 
305 
306 
307 
308 
309 
310 
311 
312 
313 
314 
315 
318 
320 
321 
322 
323 
324 
325 
326 
327 
328 
330 
331 
332 
333 
334 
335 
336 
338 
339 
340 
341 
343 
344 



1796 
661 
1725 
496 
1063 
1390 
510 
660 
430 
1044 
2019 
857 
895 
1292 
1350 
1670 
688 
961 
879 
1848 
1505 
1313 
1314 
1332 
1277 
1391 
1147 
925 
787 
1462 
531 
860 
1162 
218 
1377 
913 
2012 
702 
494 
403 
1B43 
1049 
1608 
1219 
1627 
1524 
1769 
1609 
266 
1902 
1316 
1341 
1104 
1480 
850 
1454 
670 
655 
1521 
1587 
1388 



961 
1361 
679 

1127 
172 
673 
437 

1036 
961 
606 
853 
422 
968 
712 
590 

1069 
538 
718 
570 

1084 
525 

1147 
629 
406 

652 

824 

579 

511 

1476 

818 

449 

698 

609 

814 

979 
1523 

667 

178 
1280 
1008 
1585 

563 



1608 
1566 
531 
784 
1059 
1593 
1616 
1854 
1265 
581 
1497 
1351 
1813 



916 
755 
892 
1028 
1451 
1406 
1365 
1395 
523 
1053 
1459 
603 
1494 
626 
101 
675 
677 
409 
1291 
751 
697 
471 
1156 
407 
303 
596 
1004 
888 
565 
1047 
265 
549 



-1.1 
-20.4 
•2.0 
-28.0 
-10.9 
-6.3 
-27.3 
-20.4 
-31.0 
-11.2 
>0.0 
-15.0 
-14.2 
-7.6 
-6.9 
•2.6 
-19.4 
-13.0 
-14.5 
-0.7 
-4.6 
-7.3 
-7.3 
•7.1 
-7.8 
-6.3 
-9.5 
•13.6 
•16.6 
-5.1 
-26.3 
-14.9 
-9.3 
«-35.0 
-6.5 
-13.9 
>0.0 
-19.0 
-28.1 
•32.6 
-0.7 
-11.1 
-3.3 
-8.5 
•3.0 
-4.4 
-1.5 
4.3 
<-35.0 
-0.3 
-7.3 
-7.0 
-10.1 
-4.9 
-15.1 
-5.3 
-20.0 
-20.6 
-4.4 
•3.6 
-6.3 
•30.0 
-3.3 
•3.8 
-26.3 
-16.7 
-10.9 
-3.5 
•3.2 
-0.6 
-8.0 
-23.6 
-4.7 
-6.8 
-0.9 



31.900 
17.700 

44,600 
25.800 
177.400 
45.000 
63,400 
29.000 
31.900 
48,900 
36,300 
65.200 
31,700 
42,900 
49,900 
27,100 
53.700 
42.600 
51.300 
27.300 
54.800 
25.100 
37,400 
67.200 
46.100 
37.600 
50,700 
55.900 
13,900 
37.800 
62.000 
43.600 
48,700 
38.000 
31.300 
12.400 
45,300 
169.200 
20,400 
30,100 
10,300 
49.800 
30.900 
33,700 
40,700 
34,700 
29,400 
14,700 
16,100 
17.600 
16.600 
54.900 
28.500 
14,400 
49,100 
13,300 
47,700 
420,500 
44,800 
44.700 
67,000 
20.100 
40.900 
43.700 
59,600 
24.700 
67,300 
88,500 
49,400 
30.300 
34.900 
50,300 
28.700 
102.200 
52.800 



345 
346 
347 
348 
349 
350 
351 
352 
353 
354 
355 
356 
357 
356 
359 
360 
361 
362 
363 
364 
365 
366 
367 
368 

370 

371 

372. 

373 

374 

375 

376 

377 

378 

379 

381 

382 

383 

384 

365 

386 

387 

388 

389 

390 

391 

392 

393 

394 

395 
396 
397 
399 
400 
401 
403 
404 
405 
406 
409 
410 
411 
412 
413 
415 
416 
417 
418 
419 
420 
421 
422 
423 
424 
425 



1006 
1095 
625 
361 
110 
521 
912 
1574 
961 
706 
1450 
1374 
474 
796 
764 
1364 
1713 
1161 
914 
412 
741 
678 
1560 
963 
434 
639 
1567 
■ 1875 
1351 
1506 
1823 
254 
1409 
621 
1017 
953 
856 
1252 
1699 
1042 
1490 
1554 
1193 
1374 
1456 
718 
1799 
1482 
1227 
1530 
1410 
912 
1465 
1473 
1029 
1516 
1495 
1525 
723 
650 
1501 
936 
350 
1033 
737 
1578 
646 
1695 
725 
1269 
1171 
599 
929 
739 
1490 



578 
640 

728 
963 
1343 
1130 
619 
530 
912 
762 
830 
1152 
957 
346 
338 
1068 
769 
659 
1156 
435 
486 
1503 
935 
520 
441 
610 
860 
762 
1059 
715 
532 
417 
563 
494 
595 
598 
674 
258 
1516 
493 
583 
603 
404 
902 
969 
690 
732 
758 
1461 
577 
755 
256 
1063 
450 
1140 
754 
554 
1092 
252 
663 
478 
1057 
1120 
538 
425 
606 
496 
482 
770 
1041 
912 
162 
856 
625 
965 



-11.9 
•10.3 
-21.7 
-35.3 
<-35.0 
-26.7 
•13.9 
4.7 
-12.9 
-18.9 
-5.3 
-6.5 
-28.7 
-16.3 
-17.3 
-6.4 
-2.1 
-9.3 
•13.8 
-32.0 
-17.9 
-14.6 
■3.9 
-12.4 
-31.0 
-21.2 
-3.6 
-0.5 
-6.8 
-4.6 
-0.9 
<-35.0 
-6.1 
•21.8 
-11.7 
-13.1 
-15.0 
-8.1 
-2.3 
-11.2 
-4.7 
-4.0 
-8.9 
-6.5 
-5.2 
•18.5 
-1.1 
-4.8 
-8.4 
-4.3 
-6.0 
-13.9 
-5.0 
-4.9 
-11.5 
-4.4 
-4.7 
-4,3 
-18.4 
-20.8 
-4.6 
-13.4 
-35.9 
-11.4 
-16.0 
-3.7 
-21.0 
-2.3 
•18.3 
-7.7 
-9.1 
•22.8 
-13.6 
-17.9 
-4.7 



50,800 
46.800 
42.000 
31.100 
16.300 
25.700 
48,100 
54,300 
33,900 
40,400 
37.300 
24.900 
30.600 
77.800 
79.400 
27,900 
40,100 
36,100 
24,600 
63.700 
56.200 
13.000 
33.000 
55.200 
63,000 
48,700 
36,100 
40,400 
28,300 
42.700 
54.200 
65,900 
50,400 
57,500 
49,600 
49,400 
44,900 
105,300 
12.500 
57,500 
50.400 
49.100 
67.700 
34,300 
31,700 
44,000 
41,900 
40.600 
14.400 
50,800 
40.800 
106,400 
28,100 
61,900 
25.300 
40,800 
52.500 
27,100 
106.000 
45.500 
59,000 
26.300. 
26,000 
53,700 
64,900 
48,900 
57,300 
58,600 
40,000 
28,900 
33,900 
193.700 
36.200 
47.700 
31.800 



426 

427 

428 

429 

430 

431 

432 

434 

435 

436 

437 

438 

439 

440 

441 

443 

446 

447 

448 

449 

450 

451 

452 

453 

454 

456 

457 

459 

460 

461 

462 

463 

464 

465 

466 

468 

469 

470 

471 

472 

473 

474 

475 

476 

477 

478 

479 

480 

482 

483 

485 

486 

487 

488 

489 

490 

491 

492 

493 

494 

495 

496 

497 

499 

500 

501 

502 

503 

504 

505 

506 

507 

508 

509 

510 



1296 
810 
1565 
1259 
1253 
734 
483 
518 
1020 
1122 
1870 
435 
86 
1740 
599 
743 
801 
1050 
1245 
1576 
1818 
1094 
1945 
1652 
1403 
1394 
905 
1036 
1598 
1528 

1098 
849 

1814 

1368 

1194 
577 

1140 

1797 

1293 
618 

2009 

1205 

1035 
160 
469 
599 

1009 

1216 
816 
693 

1608 
478 

1025 

1045 

1609 
775 
692 

1100 

1760 
882 
470 
494 
980 

1414 

1234 

1246 
824 

1246 

1115 

1189 

1578 
787 
979 

1153 

1730 



704 
643 
303 
847 

562 
1426 
433 
1041 
1170 
196 
673 
1102 
847 
544 
1571 
335 
668 
926 
1296 
1516 
1021 
440 
802 
894 
500 
718 
436 
581 
294 
863 

1137 

1125 

1072 
481 

1084 
467 
B88 
524 

1133 
655 
299 
215 
788 
155 

1370 
662 
540 
235 
346 
673 

1013 
599 
607 

1186 
301 

1289 
178 
964 
776 
247 

1258 

1436 
852 
546 

1072 
659 
792 

1134 

1407 
391 
402 
250 
552 
619 

1006 



-7* 

•16.0 
-OS 
-8.0 
4.1 
-18.1 
•28.5 
•26.9 
-11.6 
•9.8 
-0.5 
-31.0 

<-35.0 
-1.8 
-22.8 
-17.8 
-16.2 
-11.1 
-8.2 
-3.7 
-0.9 
•10.3 
>0.0 
•2.6 
-6.1 
-6.3 

-14.0 

-11.3 
-3.4 
-4.3 

-10.2 

-15.2 
-0.9 
-6.3 
4.9 

-23.9 
-9.6 
•1.1 
-7.6 

-21.9 
>0.0 
4.7 

-11.4 
< 35.0 

-28.9 

•22.6 

-11.8 
4.6 

-15.9 

-19.3 
-3.3 

-28.6 

•11.5 

-11.2 
-3.3 

-17.0 

-19.3 

-10.2 
-1.6 

-14.5 

-28.9 

-28.1 

-12.5 
4.0 
4.3 
4.2 

-15.7 
4.2 
-9.9 
4.9 
-3.7 

-16.6 

-12.5 
-9.4 

-2.0 



43X0 
36.800 
88.700 
36.600 
51,900 
15.500 
63.900 
26.900 
24.300 
147,600 
45,000 
26,700 
36,600 
53.200 
10.800 
80,100 
45.200 
33.300 
19.800 
12.600 
29.600 
63.100 
38.600 
34.600 
56.900 
42.600 
63.500 
50.500 
91.400 
35.900 
25,400 
25.800 
27.800 
58,700 
27.300 
60.100 
34.900 
54.800 
25.500 
46.000 
89.900 
131.300 
39.200 
207.600 
17.400 
45.600 
53,500 
117.400 
77,800 
44.900 
30.000 
49.300 
48,800 
23.700 
89,200 
20.100 
169,300 
31.800 
39,700 
110.700 
21,200 
15,200 
36.400 
53,100 
27.800 
45.700 
39,000 
25.500 
16,200 
69.700 
68.000 
109.000 
52.600 
48,100 
30.200 
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511 600 484 -16.0 56,400 

512 1099 533 -10.2 54.100 

513 1696 1034 -ZZ 29,200 

514 948 636 -13.2 47.100 

515 46V 543 -28.5 53,400 

516 1334 1044 -7.1 26,800 

517 868 1021 -14.6 29.700 
516 798 779 -16.3 39,600 

519 622 670 -15.7 45.100 

520 632 165 -21.5 189.000 

521 1332 830 -7.1 37.300 

522 603 1104 -22.6 26.600 

523 1190 309 -6.9 66.800 

524 479 1226 -28.6 22,300 

525 766 1066 -17.2 26,000 

526 747 1 016 -17.7 29,600 

527 1170 231 -9.2 119.600 

528 1502 542 4.6 53,400 
530 1728 620 -2.0 48,000 

532 507 1 011 -27.4 30,000 

533 870 489 . -14.7 57,900 

534 1347 1085 -6.8 27.300 

535 1513 346 -4.5 77,800 

536 308 654 <-35.0 46.000 

538 1851 689 -0.7 44,100 

539 1463 962 -5.1 31,100 

540 909 561 -13.9 52,000 

541 625 289 -21.7 93,100 

542 1164 198 -9.2 146,200 

543 803 655 -16.2 45.900 

544 1259 1143 -6.0 25.200 

545 856 1526 -15.0 12.200 

546 803 1071 -16.2 27,800 

547 1162 274 -9.3 96,400 

548 128 1321 <-35.0 19,000 

549 1355 1122 -6.6 25.900 

550 595 866 -23.0 35,600 

552 1369 494 -6.6 57.500 

553 992 405 -12.2 67.600 

555 1125 410 -9.6 66.900 

556 705 975 -18.9 31.400 

557 1477 1030 -4.9 29.300 
556 980 583 -12.5 50.400 

559 700 1109 -19.1 26.400 

560 1028 621 -11.5 46.000 
562 898 794 -14.1 38,900 

564 789 1446 -16.6 14.900 

565 777 766 -16.9 40.200 

566 960 328 -12.5 81,900 

567 1519 611 -4.4 48,600 

569 1212 661 -6.6 45.600 

570 760 594 -17.4 49.700 

571 616 956 -21.9 32.100 

573 1142 771 -9.6 40,000 

574 532 787 -26.2 39.300 

575 771 250 -17.1 109,200 

576 1068 534 -10.6 54,100 

577 822 734 -15.7 41.800 

578 914 754 -13.8 40,800 

579 1064 794 -10.6 38.900 

560 1524 714 -4.4 42,800 

561 1392 783 -6.3 39,400 

562 962 686 -12.4 44.200 
584 1467 672 -4.6 45.000 
565 756 731 -17.4 41,900 
586 667 1152 -19.5 24,900 
567 930 523 -13.5 55.000 
586 1886 774 -0.4 38,900 

589 642 485 -21.1 56,300 

590 1317 519 -7.3 55,300 

591 65 1546 <-35.0 11,500 

592 1014 614 -11.7 48,400 

593 732 176 -18.1 172.300 

594 1627 478 -3.0 58,000 

595 1009 1426 -11.6 15.500 
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596 619 269 

567 1176 461 

596 1465 1044 

599 741 1188 

600 907 4C2 

601 687 658 

602 712 1138 

603 898 181 

604 763 1461 

605 736 223 

606 629 273 

607 1064 286 

608 663 503 

609 2012 610 

610 1255 903 

612 1103 391 

613 778 265 

614 £24 518 

615 1095 195 

616 1759 478 

617 994 372 

618 751 374 

619 1429 516 

620 1050 520 

621 923 1105 

622 1462 622 

623 759 225 

624 758 1036 

625 1438 606 



-21.6 100,500 

-9.1 60,700 

•5.0 28.800 

•17.9 23,600 

-14.0 68,000 

-19.5 45.800 

-18.7 25.400 

•14.1 165.200 

-16.7 14,400 

-18.0 125.300 

-21.6 98.700 

-10.6 94.000 

-14.5 56.700 

>0.0 48.700 

■6-1 34.200 

•10.1 69.600 

-16.9 102.000 

-15.7 55,400 

•10.3 149,100 

•1.6 59.000 

•12.1 72.900 

•17.6 72.400 

•5.7 55.300 

•11-1 55.200 

•13-7 26,600 

-5.1 47,900 

•17.4 124,000 

•17.4 29,000 

-5.5 48.900 



626 1 096 1089 -10.2 27 200 

627 942 548 -13.3 53*000 

628 809 621 -16.0 48*000 

629 699 979 .14.1 31 300 

630 1135 1 321 -9.6 19*100 

631 879 615 -12.5 48*300 

632 1 542 1 076 -4.1 27*600 

633 1345 614 -6.9 38*000 

634 409 950 -32.2 32*400 

635 1165 704 -9.2 43*300 

636 774 604 -17.0 49 000 

637 1263 524 -8. 0 54*800 

638 952 411 -13.1 66 700 

639 1 717 575 -2.1 51000 

640 994 292 -12.1 920OO 

641 165 1 224 <-35.0 22 400 

642 803 251 -16.2 108 900 
«3 719 296 -18.5 90 700 

644 1100 294 -10.2 914O0 

645 534 1 263 -26.1 21 000 

646 1153 1038 -9.4 29*000 

648 1246 204 -e.2 140 000 

649 14 1406 <-35.0 16^200 

650 1713 1049 -2.1 28 600 

651 1966 1183 >0.0 23*800 

652 1378 816 -6.5 38 000 

653 1442 1165 -5.5 24400 

654 650 806 -20.8 38*400 

655 1111 551 -10.O 52 700 

656 1095 861 -10.3 36*000 

657 1524 540 .44 53*600 

658 1777 B60 -1.4 36*000 

659 391 584 -33.4 50*400 

660 977 565 -1 2 . 5 51.700 

661 658 166 -20.5 187 500 

662 732 312 -18.1 86*100 

663 1787 567 -1.2 51 500 

664 888 268 -14.4 100*900 

665 889 775 -14.3 39,800 

666 71 5 221 -1 8.6 1 26 300 

667 781 227 -16.8 122*400 

668 646 165 -21.0 189 100 

669 1116 353 -9.9 76*300 

670 1382 643 -6.4 46*600 

671 547 789 -25.3 39*200 
673 984 746 -12.4 41.200 



448 

562 


-2.7 


62.100 


-4.4 


51.900 


642 


-18.6 


46.700 


615 


-13.7 


46.300 


551 


-10.5 


52.700 


923 


-22.7 


33,400 


1004 


-8.3 


30.300 


283 


-10.1 


95.100 


477 


-6.1 


59.100 


249 


-3.4 


109.600 


699 


-24.8 


43.500 


1313 


-9.2 


19.300 


790 


0.0 


39.100 


619 


-4.1 


48.100 


764 


-5.2 


40.300 


953 


-11.8 


32.300 


270 


>0.0 


100,200 


686 


-16.0 


34,900 


1461 


-6.4 


14,400 


819 


>0.0 


37.800 


656 


•3.0 


45,900 


254 


-13.6 


107,000 


715 


•0.6 


42,700 


345 


>0.0 


78.000 


563 


-13.0 


51,800 


730 


•4.2 


42.000 


900 


-23.8 


34.400 



~ ~ ai.wo 

705 1278 571 -7.8 51.200 

706 1841 704 -0.7 43 300 

707 1018 1386 -11.7 16*900 

709 1074 1H5 -10.7 25*100 

710 293 889 <-35.0 34*800 

712 720 412 -18.5 66*600 

713 1386 841 -6.4 36*800 

714 1328 263 -7.1 103*100 

715 698 433 -19.1 63*900 

716 701 481 -19.0 58 700 

717 1875 699 -0.5 43*600 

718 575 702 -23.9 43*400 

719 1216 204 -8.6 140*400 

721 1069 464 -10.8 60*400 

722 1272 506 -7.9 56 400 

723 958 822 -13.0 37*700 

724 763 395 -17.3 69*100 

725 720 916 -18.5 33*700 

726 1476 415 4.9 66*200 

727 1846 473 -0.7 59*400 

728 510 783 -27.3 39*400 

729 1217 1126 *.6 25*800 

730 1858 724 -0. 6 42*300 

731 665 765 -20.2 40300 

733 1321 312 -7.2 85 900 

734 719 427 -18.5 64*600 

735 1101 473 -10.2 59*500 

736 1359 569 -6.7 51*400 

738 696 220 -19.2 127*600 

739 687 409 -19.5 67*000 

740 1205 256 -8.7 1 06*200 

741 995 563 -12.1 51 900 

742 898 596 -14.1 49*500 

743 881 181 .14.5 165*900 

744 1951 686 >0.0 44*200 

745 726 168 -18.3 1 83 600 

746 999 643 -12.0 46*600 

748 182 1 503 <-35.0 13*000 

749 2005 649 >0.0 46*300 

750 1448 575 -5.4 51*000 

751 792 266 -16.5 101*900 

752 469 296 -28.9 90*600 

754 664 254 -20.3 107*000 

755 1195 184 -8,8 161*000 

756 1821 1113 -0.9 26*300 

757 909 246 -13.9 111 000 
760 790 133 -16.5 264.900 
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761 


1399 


713 




41 fiftA 


* M 


1418 

1* IS 


IMS 


•ajr 


97 VY1 


7&4 


2020 


3©¥ 


ft 


91,400 


755 


651 


475 


-20.6 


56,300 


766 


1052 


1149 


•11.1 


25,000 


767 


1968 


466 


>0.0 


59,900 


768 


1330 


665 


-7.1 


44.300 


769 


1S70 


613 


>0.0 


46,500 


770 


857 


617 


•15.0 


46,200 


771 


1337 


974 


-7.0 


31,500 


773 


1576 


502 


-3.7 


56,700 


775 


969 


624 


-12.8 


37,600 


776 


1438 


706 


-5.5 


43.100 


777 


1539 


458 


-4.2 


61,000 


778 


850 


434 


•15.1 


63,800 


77© 


700 


411 


-19.1 


66.800 


780 


1052 


1136 


-11.1 


25,500 


784 


1413 


529 


-6.0 


54,400 


785 


1364 


885 


-6.7 


35,000 


786 


1822 


635 


-0.9 


37,100 


767 


893 


392 


-14.3 


69.500 


790 


616 


882 


•22.0 


35,100 


791 


451 


1429 


•29.6 


15.400 


792 


777 


377 


-16.9 


72.000 


793 


1536 


1543 


-4.2 


11.700 


794 


1461 


807 


-5.1 


38.300 


796 


388 


546 


•33.6 


53,100 


797 


1126 


212 


-9.8 


133,700 


798 


933 


437 


•13.5 


63.400 


799 


1420 


593 


-5.9 


^9.800 


800 


1756 


279 


-1.6 


96.500 


801 


624 


865 


-21.7 


35.800 


602 


898 


547 


-14.2 


53.000 


803 


1775 


1468 


•1.4 


14.200 


804 


573 


196 


•24.0 


148.400 


805 


203 


494 


<-35.0 


57.400 


806 


960 


1039 


-12.5 


29.000 


807 


902 


308 


-14.1 


87.200 


608 


625 


827 


•21.7 


37,500 


809 


1851 


1015 


-0.7 


29,900 


810 


440 


573 


-30.9 


51,100 


811 


1358 


249 


-6.8 


109.700 


812 


851 


393 


-15.1 


69.400 


813 


745 


1246 


-17.8 


21.600 


814 


2028 


810 


>0.0 


38,200 


815 


1086 


645 


-10.4 


46,500 


816 


629 


313 


-21.6 


85.700 


817 


1376 


1177 


-6.5 


24.000 


818 


1771 


790 


-1.4 


39.100 


819 


1045 


263 


-11.2 


103,100 


820 


984 


362 


-12.4 


74,600 


821 


1712 


279 


-2.2 


96.700 


622 


1256 


205 


-8.1 


139.200 


823 


1517 


654 


-4.4 


46.000 


824 


1442 


449 


-5.5 


62.000 


825 


1240 


513 


-6.3 


55.800 


826 


1309 


1014 


-7.4 


29,900 


627 


2012 


706 


>0.0 


43.100 


828 


937 


1405 


-13.4 


16,200 


630 


1342 


756 


-7.0 


40.700 


831 


562 


626 


-24.5 


37.500 


832 


1073 


1039 


-10.7 


29.000 


833 


481 


620 


•28.5 


37,800 


834 


501 


561 


-27.6 


50,500 


637 


751 


746 


.17 ft 


41,lOO 


838 


635 


633 


-21 J 


37.200 


839 


1494 


459 


-4.7 


60.900 


840 


1952 


301 


>0.0 


89.300 


841 < 


1565 1080 


-3.6 


27,500 


842 


571 1312 


-24.1 


19.400 


843 1325 


649 


-7.2 


46.300 


844 1727 


301 


-2.0 


89.200 


845 


630 


679 


•21.5 


44.600 


846 2016 


905 


>0.0 


34.200 


647 


673 1200 


19.9 


23,200 
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648 1863 

649 1166 



650 
851 
852 
655 
856 
857 
658 
659 
660 
861 
862 
864 
865 
866 
868 
669 



271 
523 
1535 1024 



1035 
634 
499 

1063 
867 

1448 
706 



826 
542 
220 
194 
890 
639 
311 



1070 1066 
472 347 



674 

1307 
645 
827 
685 
1807 
£70 1323 

671 1228 1031 

672 1904 346 
556 

1540 
1566 
1196 
1076 



480 
499 
867 
1004 
494 
402 
783 



673 
874 
875 
876 
677 



878 1161 

879 647 



860 

881 

883 

884 

885 

686 

887 

888 

889 

890 

891 

892 

894 

895 

896 

897 

898 

899 

900 

901 

903 

904 

905 

907 

906 

910 

911 

913 

914 

916 

917 

919 

920 

921 

923 

924 

925 

926 

927 

928 

929 

931 

932 

933 

934 

936 



1756 
1543 
1432 
922 
1103 
1501 
798 
636 
951 
717 
1123 
891 
1245 
1962 
1322 
420 
662 
845 
624 
931 
799 
765 
775 
868 
826 
681 
1544 
1606 
1237 
1442 
1260 
764 
1133 
1123 
829 
1131 
1441 
679 
1487 
1062 
1231 
1609 
810 
965 
947 
865 
937 1421 



647 
756 
777 
351 
720 
1111 
757 
594 
278 
690 
689 
414 
607 
1103 
634 
759 
548 
229 
413 
234 
346 
626 
570 
428 
243 
703 
1094 
229 
520 
889 
824 
1303 
1544 
301 
387 
688 
749 
367 
1541 
1123 
380 
242 
318 
874 
219 
1191 
775 
816 
670 
900 
520 
462 
843 
1056 



-0.6 


99,500 


939 


•9.2 


54,900 


941 


-4.2 


29.600 


942 


•11.4 


37.500 


943 


•15.5 


53,400 


944 


•27.8 


127.100 


945 


-10.9 


150,500 


946 


-14.4 


34,800 


947 


-5.4 


46.900 


948 


-16.9 


86.200 


949 


-10.7 


28.000 


950 


-26.8 


77,600 


951 


-19.9 


56.800 


952 


•7.4 


57.000 


954 


-21.0 


34, goo 


955 


•15.6 


30,300 


957 


-19.5 


57,400 


956 


•1.0 


66.000 


960 


•7.2 


39,400 


961 


-6.4 


29.300 


962 


-0.3 


77,700 


963 


-24.8 


46,400 


964 


-4.2 * 


40,700 


965 


-3.8 


39,700 


966 


-B.8 


76,800 


967 


-10.6 


42,500 


966 


-9.3 


26,400 


969 


-20.9 


40.700 


970 


-1.6 


49,700 


971 


•4.1 


97,100 


972 


-5.7 


34,800 


974 


•13.7 


44,100 


975 


-10.1 


66,400 


976 


-4.6 


48,900 


977 


-16.3 


26.600 


978 


-21.3 


47,200 


979 


•13.1 


40,600 


960 


-16.6 


52,900 


961 


•9.8 


121.200 


963 


-14.3 


66,400 


964 


•6.2 


117,800 


965 


>0.0 


77,700 


967 


-7.2 


47.700 


988 


-31.4 


51.300 


990 


-20.3 


64.500 


991 


-15.3 


113.000 


992 


-21.7 


43,400 


993 


-13.5 


27,000 


994 


-16.3 


121,000 


995 


-17.2 


55.200 


996 


-17.0 


34,800 


997 


-14.4 


37,600 


996 


•15.6 


19,700 




-19.7 


11.700 


1000 


-4.1 


69,100 


1001 


-3.3 


70,400 


1002 


■8.3 


44.100 


1003 


-55 


41.100 


1006 


-8.0 


73,700 


1007 


-17.3 


11.700 


1009 


-9.7 


25,900 


1010 


-9.8 


71,500 


1011 


-15.6 


113.200 


1012 


-9.7 


64,300 


1013 


-5.5 


35,400 


1014 


-19.7 


126.200 


1015 


-4.8 


23,500 


1016 


■10.5 


39.800 


1017 


-8.4 


38,000 


1018 


-3.3 


45.100 


1020 


16.0 


34,400 


1021 


12.8 


55.100 


1022 


13.2 


60.600 


1023 


14.8 


36,800 


1024 


-5.9 


28.400 


1025 



1765 
602 
312 
983 

1300 
630 
187 



860 
957 
503 



827 
.885 
472 
496 
491 
269 
423 



4J 

-13 
-22.7 
«-35.0 
-12.1 
-73 
-21.6 
736 <-35.0 
344 -6.5 



665 
193 
152 
701 
S47 
712 
816 
174 
419 
409 
320 
334 



768 
596 
557 
887 
564 
969 1155 
671 256 
1204 796 
910 154 
609 1048 
206 
232 
437 
567 



822 
976 
403 
279 



295 
664 
642 
749 1141 
642 
911 



994 



1197 



1344 317 
1024 1105 
739 1159 



816 
785 
1159 



£55 
361 
317 
928 
701 
811 
461 
847 
579 
504 
289 
290 
771 
1736 476 
643 1164 
822 487 



647 

902 
888 
1815 
1205 
617 
968 
970 



875 
291 



459 



279 
644 
745 
541 
679 661 
1818 1126 
1032 634 
1629 994 
1311 1134 
1722 424 
1015 743 
1219 
464 
63 
317 



781 
1129 
812 
785 



-13 
-11.3 
-14.9 
-13.0 
-27.6 
>0.0 
-11.8 
-17.2 
•23.0 
-24.8 
-14.4 
-24.5 
-12.6 
-20.0 
-6.7 
-13.9 
-22.3 
-7.7 
-15.8 
-12.6 
-32.6 



495 <-35.0 
961 -15.3 



739 



-9.8 
-12.1 
-3.2 
-17.7 
-10.8 
-6.8 
-1.6 
-6.9 
-11.5 
-17.9 
-15.9 
-16.7 
-9.3 
-10.4 
-11.5 
-15.2 
-14.1 
-14.4 
-0.9 
-8.7 
-22.0 
•12.8 
-12.7 
-1.9 
•21.1 
-15.8 
-14.6 
<-35.0 
-6.4 
•29.4 
-19.7 
-0.9 
-11.4 
-3.0 
-7.4 
•2.0 
-11.7 
•3.7 
•16.6 
-9.7 
•15.9 
•16.7 
-7.7 



37.500 
35.000 
59.600 
57.100 
57,700 
100.300 
65,100 
41.600 
76.200 
45,400 
151,000 
213.000 
43,400 
53.000 
42.900 
37.900 
174.900 
65,700 
67.100 
83,900 
60.500 
24,800 
106.600 
36,700 
210.300 
28.700 
138.900 
119,300 
63.400 
51,600 
57,400 
31,200 
91,100 
45.400 
46.700 
25.300 
46.700 
33.900 
12.800 
84,700 
26.600 
24.600 
52.400 
74.900 
84,500 
33.300 
43,400 
38.200 
60,700 
36.600 
50.700 
56.500 
93.100 
92.700 
40.000 
56,900 
23.700 
56.100 
96,400 
46.600 
41.200 
53,500 
45,600 
25.800 
47.200 
30,700 
25.500 
65,000 
41.300 
22.500 
58,400 
591,300 
84.600 
62,400 
41.500 
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X 


_ y 


CPKOT 


~ SDSMW" 


1UA 


ATX. 


552 


-3275 


52,600 


1U«/ 




848 


•7.5 


36,500 


lUcB 


cm 


647 


-15.0 


53,000 


• MA 

luw 




226 


-7.7 


123,200 


«M1 
1IXJ 1 




- 822 


-12.3 


37,700 


10m 


1547 


403 


-4.1 


67,900 


1UJO 


1381 


551 


-6.4 


52,700 




l£93 


496 


-4.3 


57,200 


1035 


1 1 £0 


&4S 


-9.7 




1096 


* ***** 
1220 


4/4 




9C,«HJU 


1036 


1761 


ZdZ 


-I .© 




1040 


541 


639 


*»C 7 

-25.7 


36,900 


1041 


818 


91 0 


•15.0 


34,000 


1044 


1036 


485 


-1 1 .3 


58,300 


1045 


1439 


407 


-5.5 


67,300 


1047 


1540 


250 


-4.2 


109,200 


1048 


1576 


635 


•3.7 


47.100 


1040 


1089 


411 


-10.4 


66,700 


1050 


949 


1040 


-13 J2 


26.900 


1051 


426 


816 


-31.1 


37,800 


1052 


1583 


1365 


-3.6 


16,900 


1053 


779 


1092 


•16.8 


27,000 


1054 


1613 


620 


-3.2 


46,000 


1055 


1380 


377 


-6.5 


72,000 


1056 


284 


663 


<^35.0 


45.500 


1056 


1261 


746 


-8.0 


41,200 


1060 


393 


605 


-33.3 


49.000 


1061 


1817 


645 


-0.9 


46.600 


1062 


1245 


746 


-8.2 


41.200 


1064 


1256 


792 


-8.1 


39.000 


1065 


705 


934 


-18.9 


33.000 


1066 


1181 


734 


-9.0 


41.800 


1067 


529 


656 


•26.3 


45.800 


1066 


508 


696 


-27.4 


43.700 


1060 


1898 


604 


-0.3 


49,100 


1071 


873 


609 


-14.7 


48.700 


1073 


1768 


1128 


-1 .5 


25.800 


1075 


836 


773 


-15.4 


39.900 


1076 


1863 


861 


-0.6 


36,000 


1078 


826 


566 


-15.7 


51,600 


1081 


971 


483 


-12.7 


56,500 


1083 


1697 


202 


-2.3 


142,300 


1085 


1157 


794 


-9.4 


38,900 


1090 


620 


910 


-21 .9 


34.000 


1092 


1867 


597 


-0.5 


49,500 


1093 


2019 


894 


>0.6 


34,600 


1094 


1546 


538 


-4.1 


53.700 


1095 


1545 


477 


-4.1 


59,100 


1096 


61 


935 


<-35.0 


33,000 


1099 


1954 


237 


>0.0 


116,000 


1101 


588 


1048 


•23.3 


28,600 


1 102 


1050 


667 


-11.1 


45,200 


1103 


457 


797 


-29.5 


38,800 


1105 


1684 


532 


^).4 


54,200 


1106 


1714 


649 


-2.1 


46.300 


1107 


1717 


546 


-2.1 


53.100 


1108 


1976 


722 


>0.0 


42,400 


1111 


547 


1066 


-25.3 


28,000 


1112 


1348 


621 


S.9 


48,000 


1115 


1385 


762 


•S.4 


40,400 


1116 


1078 


816 


-10.6 


38,000 


1117 


975 


787 


-12.6 


39,300 


1118 


1202 


933 


-8.7 


33.100 


1119 


1022 


1076 


-11. 0 


27,600 


1120 


1905 


616 




48.300 


1121 


1512 


1301 


-4.5 


19,700 


US 


4 * 4 A 
1114 


677 


ft Q 


44,700 


1123 


1464 


452 


-5.1 


61.700 


1125 


1048 


857 


•11.1 


36.200 


1126 


1122 


802 


-9.8 


38.600 


1128 


1722 


892 


-2.1 


34.700 


1133 


1098 


825 


-10.2 


37.500 


1139 


1630 


569 


4.8 


51.400 


1147 


764 


1162 


-17 J 


23.800 


1148 


1968 


724 


>0.0 


42.300 



*" *MSN "X Y CPKol SOSMW 



1153 921 1158 -13.7 24,700 

1154 1594 864 -3.5 35,900 

1161 637 400 -21.3 68 400 

1162 623 397 -21.8 66 800 

1163 665 397 -20.2 68*700 
1168 564 528 -24.4 54*500 

1170 552 529 -25.0 54*500 

1171 538 524 -25.9 54 800 

1172 545 514 .25.5 55 700 
1174 1099 522 -10.2 55 000 

1176 1304 586 .7.5 50*200 

1177 1366 539 -6.6 53*700 

1178 1608 702 -3.3 43*400 

1179 1485 224 -4.8 124*900 

1180 1459 224 -5.2 124*900 

1181 1431 223 -5.7 125*100 

1182 1407 223 -6.1 125*200 

1183 1383 224 -6.4 1 24 700 

1184 1454 182 -5.3 1 64 400 

1185 1422 183 -5.8 1 62 600 

1186 1394 182 -6.3 1 64^300 

1189 1171 214 -8.2 131 800 

1190 1457 286 -5.2 94,200 

1191 686 1114 -19.5 26*200 
1182 265 893 <-35.0 34 700 

1193 403 1292 -32.6 20*000 

1194 344 1275 <-35.0 20*600 

1195 505 1311 .27.6 19 400 

1196 572 1293 -24.1 2o]oOO 

1197 639 1502 -21.2 13 000 

1198 637 1402 -21.3 16*300 

1199 614 1407 -22.1 16*200 

1200 637 1431 -21.3 15 400 

1201 1095 1394 -10.3 16i600 

1202 1719 1545 -2.1 11*600 

1203 791 668 -16.5 45*200 

1204 964 1021 -12.9 29.700 

1205 313 195 <-35.0 148.700 

1208 306 194 «-3£.0 149.800 

1209 320 197 «c.35.0 147.400 

1210 326 197 <-35.0 146 600 

1211 394 294 -33.2 9^400 

1212 402 294 -327 91.200 

1214 386 294 -33.7 91.400 

1215 641 329 -21.2 81.600 

1216 660 329 -20.4 81,600 

1217 914 266 -13.8 101.800 

1218 873 245 -14.7 112 000 

1219 970 372 -12.7 72.900 

1220 1021 298 .11.6 90,100 

1221 1392 205 -6.3 139500 

1222 1354 203 -6.8 141,800 

1223 1362 205 -6.7 139.500 

1224 673 540 -19.9 53,600 

1225 614 542 -22.1 53^400 

1226 603 539 -22.6 53 600 

1227 696 623 -19.2 47!b00 

1228 707 628 -18.9 47.500 

1229 475 447 -2B.7 62,300 

1230 466 1282 -29.0 20.400 

1231 759 1461 -17.4 14.400 

1232 1324 1170 -7.2 24,200 

1233 1583 1005 -3.6 30300 

1234 1865 609 -0.6 38 200 

1235 1812 817 -1.0 37 900 

1236 1411 703 -6.0 43 400 

1237 1392 682 -6.3 44^500 

1238 794 410 -16.4 66 900 

1239 769 407 -17.1 67.300 

1240 740 406 -17.9 67,500 

1241 743 511 -17.8 55.900 

1242 713 510 -18.7 56.000 

1243 682 509 -19.6 56,100 

1244 663 504 -20.3 56.500 

1245 565 582 -24.4 50.500 
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1260 


1629 
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42.800 
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-4.0 


42.600 


1262 


1466 
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1340 
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Table 3~Cempuied.p/* of. two ku ofxarb*my]*ied protein standards: Rabbit muscle CP* and bumin 
hemoglobin (Hb) 



PIR #ASP #GLU #HIS #LYS #ARG NH2- Calc 
Protein Name Name 3.9 4.1 6.0 10.8 12J 7.0 pi 



Real 
CPK 



0 


Rabbit muscle CPK KlfiBCM 28 


27 


17 


-1 


28 


27 


17 


-2 


28 


27 


17 


-3 


28 


27 


17 


-4 


28 


27 


17 


-5 


28 


27 


17 


-6 


28 


27 


17 


-7 


28 


27 


17 


-8 


28 


27 


17 


-9 


28 


27 
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Table 4. Compuied p/s of some known proteins related to measured CPK pfs 
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'A tW-diinensional gel database of ratJiver proteins 
useful in gene regulation and drug effects studies 

fFScT^ tW0 * dimen i i0D ^ (2 ; D) proteia ma P of ^cher 344 rat liver 
£ih,™ f ? " f T^t 3 labular,isti °8 of more than 1200 protein species 
Sod um oodecyl sulfate (SDS) molecular mass and isoelectric point have bttnll 
ublished, based on pcsitions of numerous internal standards. This ma P has been 

rim of c,°Sf S fn° mPa , r K hUnd , r£dS °r f 2 -° gClS 0f rat Iiv " ««.Pl« from a", 
riety of studies, and forms the nucleus of an expanding database describing rat 

of such «= stucy, mvolv.ng regulation of cholesterol synthesis bv cholesterol-lower- 

iaine C ; whh" C " "f * P S"* " Pr " Cmed - Sin « the ™ ' " s be n ob- 
tained with a w.cely used and highly reproducible 2-D gel system (the Iso-Dalt° 
system),,, can be directly related toan expanding body If AS Sir labomo- 



Contents 1 Introduction 

2 1 SSl3™u^«-:::::::::::: 9 ° 0 0 g 7 Sat^ts- electrophoresis of P n>- 

2.1 Sample preparation . : oSf b «r S Sv^fh S?"™ 11 2nd 0thers has 

2.2 Two-dimensional electrophoresis 9S9 r i«v' 0 f bloSi «!«, SU,ng k 10 e " mine a wide va " 

2.3 Staining |g! ""J fLn 10 '?!?' S f mS ' tb !. r u esults a PP«™g in more 

2.4 Positional standardization 9S9 ? e d 

2.5 Computer analysis 0S9 «, d ^ ™.,n > 8 l w °- d,n " n « io »»' (2-D) gel inn- 

2.6 Graphical data output «? J 8 " n a "J d C0 " ™ ,n « s P ot da < ab "«. it is also possible to 

2.7 Experiment LSBC04 I S P iS:'"^™ b,e mte8rated bodies of information de- 

3 Results and discussion 9 J "f ' b n * * n n e a Pf" ra " c ' an , d regulation of thousands of pro- 

3.1 The rat liver protein 2-D map ... [ I J ^ « ?! f^" 15, 6) - Crea,,n S sucr > databases involves 

3.2 Carbamylated charge standards compu tedpfs oP D «w ,n S 7- ? 6 quanma,,ve dala from thousands 
and molecular mass standardization'. ''.^ 91 , ^fv^dSo^' 3 C ° mmi! ™ nt in 

33 An example of rat liver gene regulation: Chol- 
esterol metabolism Oil Hiv-n fh» i» 

3.3. 1 MSN 413 (putative cvtosolic HMG-CoA taba^e fh\ S w n , re ? u, l red ,0 devel °P a P™««> da- 
. synthase) and sets of spots regulated co- IbleTm n on 1 w° ■ bl ° i0E ' Cal SySlem ,akes on """in- 
ordinately or inversely e 8 u,aie0 » able .mponance While m vuro systems are ideal for answer- 

3.3.2 M SN 235 and co"egulated spots 15 'JtT* ! xpenmental Wioni, especially in cancer re- 

3.3.3 An example of an anSyne g7s"c effect 2 2 ?sSf„?fTf ° Ur K eXperience wi,h «" "Uuies and 
3.3. Compjxi, of the choleUI, synS ? pS^^l^^^^ 

4 Conclusions....:.!.".'.' ? ? ,ver """examples from rats and mice appear.io show grea- 

5 References Ill reproducibility (in terms of individual pro- 

6 Addendum 1: Figures 1-13 .'.'.' i.".' .' 9 4 l"?,?" 10 ?' th f an , re P licatecellcultu res.This is perhaps 

7 Addendum 2: Tables 1-4 £3 ' l'!, 5 ,°k *° me ° S,aSiS maintained a com- 

Table 1. Master table of proteins in rat liCer data- ^ ne^^ 

Table 2. Table of some identified proteins" 9^'s ^! a L b °nf n f .M S 1 e , rUm) ■ C °" di,ion f ^ and 8«etic "ev o : 

Table 3. Computed pi's of two sets of carbarn viated ' f3 keS Wh " e " CU,lu r re - ll is also more difr,cu " 

protein standards: rabbit musch Sk Td ? V? T nB ° f pr0,e,n from cel1 culture 

human Hb 0,0 f. ys,ems (P arlic "' a rl. v «"th attached cells), forcing the inves- 

Table 4. Computed pfs of some icn'ow^ protein's re- Son lc ? pe - based or ver-based stain- 

lated to measured CPKp/V. 930 fiv/r '' " r ^ these me,ho ^ are more sensi- 

p yju ^ ve (sometimes much more sensitive) than the Coomassie 

Brilliant Blue (CBB) stain typically used for protein detec- 
tion m "large" protein samples, they are generally more vari- 
able, more labor-intensive and, in the case of radiographic 
methods, may generate highly "noisy" images, due to the 

. properties of the films used. By contrast, large protein sam- 

ConespoBdoci: Dr. N. Leigh Anderson. Urge Scale Biology Corpora- p ^n^tf^n^' bC prepared from live r using urea/Nonidet 

tion. 9620 Medical Center Drive, RoekviUe, MD 20830. USA (Nr-40) solubilization and stained with CBB, which 

has the advantage of being easily reproducible [81. Finally 

Abbr.Ti.iioa,: CBB, Coomassie Brilliam Bme; CPK. creatine phospho- tne re remains the question of the "truthfulness" of manv i» 

kinase: 2-D. twc-dimension.l; IEr. isoelectric focusing: MSN, master v/w systems as compared to their in wvo analog- hVw 

spot number: NP-40. Non.de, mo, SDS. sodium dodecy, su.r.ie great are the changes c'aused by the ir ZoZZ ! ffi. Z 
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*~ lure sntfthrassooaied shift to-strong -selection for growth, 
and how do these affect experimental outcomes? Hence 
the apparent advantages of in vitro systems, in terms of ex- 
perimental manipulation, may be counterbalanced by 
other factors relating to 2-D data quality. 

There is a second important class of reasons for exploring 
the use of an in vivo biological system such as the liver. His- 
torically, there have been two broad approaches to the me- 
chanistic dissection of biochemical processes in intact cel- 
lular systems: genetics (a search for informative mutants) 
and the use of chemical agents (drugs and chemical toxins). 
Both approaches help us to understand complex systems 
by disrupting some specific functional element and show- 
ing us the result. With the development of techniques for 
genetic manipulation and cloning, the genetic approach 
can be effectively applied either in vitro or in v/vo. although 
the in vitro route is usually quicker. The chemical approach 
can also be applied to either son of biological system; here, 
however, the bulk of consistently acquired information is 
in experimental animals (rats and mice). While most biolo- 
gists know a short list of compounds having specific, experi- 
mentally useful effects (e.g., inhibitors of protein synthesis, 
ionophores, polymerase inhibitors, channel blockers, nu- 
cleotide analogs, and compounds affecting polymerization 
of cytoskeletal proteins), there is a much larger number of 
interesting chemically-induced effects, most of them char- 
acterized by toxicologists and pharmacologists in rodent 
systems. Just as a thorough genetic analysis would involve 
saturating a genome with mutations, it is possible to ima- 
gine a saturating number of drugs, the analysis of whose ac- 
tions would reveal the complete biochemistry of the cell. 
While organized drug discovery efforts usually target spe- 
cific desired effects, the nature of the process, with its de- 
pendence on screening large numbers of compounds, ne- 
cessarily produces many unanticipated effects. It is there- 
fore reasonable to suppose that the required broad range of 
compounds necessary to achieve "biochemical saturation" 
may be forthcoming; in fact, it may already exist among the 
hundreds of thousands of compounds that failed to qualify 
as drugs. 

Among organs, the liver is an obvious choice for the study 
of chemical effects because of its well-known plasticity and 
responsiveness. The brain appears to be quite plastic (e.g. 
[7]), but it is a complicated mixture of cell types requiring 
skillful dissection for most experiments. The kidney, while 
quite responsive, also presents a potentially confounding 
mixture of cell types. The liver, by contrast, is made up of 
one predominant cell type which is easy to solubilize: the 
hepatocyte, representing more than 95 % of its mass. Most 
importantly, the liver performs many homeostatic func- 
tions that require rapid modulation of gene expression. It 
appears that most chemical agents tested affect gene ex- 
pression in the liver at some dosage (N. Leigh Anderson, 
unpublished observations), an interesting contrast to our 
earlier work with lymphocytes, for example, which seem to 
be much less responsive. Such results conform to the expec- 
tation that cells with a homeostatic, physiological role 
sh uld be more plastic than cells differentiaied for a pur- 
pose dependent on the action of a limited number of spe- 
cific genes. 

The liver also allows the parallels between in vitro and in 
vivo systems to be examined in detail. Significant progress 



has been maHe in the development of mouse, rat and hu- 
man hepatocyte culture system^ well as in precision-cut 
tissue slices. Using such an array of techniques, it is possi- 
ble to assemble a matrix of mammalian systems including 
mouse and rat in vivo on one level and mouse, rat and hu- 
man in vitro on a second level, and to compare effects be- 
tween species and between systems. This approach allows 
us to craw informed conclusions regarding the biochemical 
"universality" of biological responses among the mammals, 
and to offer some insight into the validity of in vitro ap' 
proaches for toxicological screening. We believe this data 
will be necessary if in vitro alternatives are to achieve wide 
usage in government-mandated safety testing of drugs, con- 
sumer products and industrial and agricultural chemicals. 

A number of interesting studies have been published using 
2-D mapping to examine effects in the rodent liver. A num- 
ber of investigarors have made use of the technique to 
screen for existing genetic variants (8-1 1] or induced muta- 
tions [12-14], mainly in the mouse. This work builds on the 
wealth of genetic information available on the mouse and 
its established position as a mammalian mutation-detec- 
tion system. While some studies of chemical effects have 
been undertaken in the mouse [15-17], most have used the 
rat [18-23]. The examination of the cytochrome p-450 sys- 
tem, in particular, has been carried out almost exclusively 
on the rat [24, 25]. 

These considerations lead us to conclude that rodent liver 
offers the best opportunity to systematically examine an 
array of gene regulation systems, and ultimately to build a 
predictive model of large-scale mammalian gene control. 
The basic underlying foundation of such a project is a reli- 
able, reproducible master2-D pattern of liver, to which on- 
going experimental results can be referred. In this paper, we 
report such a master pattern for the acidic and neutral pro- 
teins of rat liver (pattern F344MST3). In future, this master 
will be supplemented by maps of basic proteins,and analog- 
ous maps of mouse and human liver. 



2 Materials and methods 
2.1 Sample preparation 

Liver is an ideal sample material for most biochemical stud- 
ies, including 2-D analysis. A sample is taken of approxima- 
tely 0.5 g of tissue from the apical end of the left lobe of the 
liver. Solubilization is effected as rapidly as practical; a 
delay of 5-15 min appears to cause no major alteration in 
liver protein composition if the liver pieces are kept cold 
(e.g., on ice) in the interim. In the solubilization process, 
the liver sample is weighed, placed in a glass homogenizer 
{e.g., 15 mL Wheaton); 8 volumes of solubilizing solution* 

• The solubilizing solution is composed of 2%> NP-40 (Sigma), 9 m urea 
(analytical grade, e.g., BDH or Bio-Rad), 0.5% dithiothreitol (DTT; 
Sigma) and 2% carrier ampholytes (pH 9-11 LKB: these come u a 20% 
stock solution. so 7 % final concentration isachieved by making the final 
solution 10% 9- 11 Ampholine by volume). A large batch of solubiliier 
(several hundred mL) is made and stored frozen at -80*C in aliquou 
sufficient to provide enough for one day's estimated sample pre pan* 
tion requirement. The solution is never allowed to become warmer 
than room temperature at any stage during preparation or thawing for 
use, since healing of concentrated urea solutions can produce contami- 
nants that covalently modify proteins producing anifaciua! charge 
shifts. Once thawed, any unused solubilizer is discarded. 
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is added (i.e., A ml per 0.5 g tissue) end the mixture i« ho 
--mogeaized-using first the-loose-arTdrfjen then the ticht-fit- 
ting glass pestle. This takes approximately 5 stroke* with 
each pestle and is carried out at room temperature bt«u«e 
urea would crystallize out in the cold. Once the liver sample 
is thoroughly homogenized in the solubDizer.it i< slurried 
that m the proteins are denatured (by the chaotrepic" effect 
of the urea and NP-40 detergent) and the enzvnes inacti- 
vated by the high pH (-9.5). Therefore these samples may 
be kept at room temperature until they can be centrifueed 
orfrozen as a group (within several hours of preparation) 
The samples are centrifuged for 6 X 10* g min (e.g., 500 000 
X g for 12 mm using a Beckman TL-100 centrifuge) The 
centrifuge rotor is maintained at just below room tempera- 
ture (e.g., 15-20°C), but not too cold.so as to preven the 
precipnauon of urea. The centrifuge of choice is a Beckman 
TL-100 because of the sample tube sizes available, but anv 
ultracentnfuge accepting smallish tubes win suffice When 
an appropriate centrifuge is not available near the cite of 
sample preparation, samples can be frozen at -80'C end 
thawed prior to centrifugaiion and collection of cuperna- 
tarns. Each supernatant is carefully removed following cen- 
trifugaiion and aliquoted into at least 4 clean lubes for stor- 
age. This ,s done by transferring all the supernatant to one 
dean tube, mixing this gently (to assure homogeneous 
composition) and then dividing it into A aliquots The ali- 
quots are frozen immediately at -EO'C. These multiple ali- 
quots can prov,de insurance against a failed run or* freezer 
breakdown. 
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.directly (as.extnided) onto the slab eels with™,, . ..... 

lion and held in place by m^t b TZ^^' 
««• .produced by LSB) toxoid i « e/SfiLSS 
Second-dimensional slab gels are run overnight L * * 
of 20 in cooled DALT Uri IM^IS^S^ 
All run parameters, reagent source and lot inforaS" 

™LrH £ ,1 nJC ' an responsible °" a detailed, nulti-EJe 
record of the experiment. p 8 

2J Staining 



2.2 Two-dimensional electrophoresis 

Sample proteins are resolved by 2-D electrophoresis u<in£ 
the 20 X 25 cm Iso-Dalt* 2-D gel system ([26-291- pro- 
duced by LSB and by Hoefer Scientific Instruments San 
Francisco) operating with 20 gels per batch. AH first-dimen- 
sional isoelectric focusing (IEF) gels are prepared using the 

rRnL S . m f J . e - Sla f dardi2ed balCh of carrie ' ampholytes 
(EDH 4-8A m the present case, selected bv LSB's batch- 
testing program for rat and mouse database'work") A 10 
ML sample of solubilized liver protein is applied to each eel 
and the gels are run for 33 000 to 34 500 volt-hours using a 
progressively increasing voltage protocol implemented by 
a programmable high-voltage power supply. An Anse- 
hque computer-controlled gradient-casting system (pro- 
duced by LSB) ,s used to prepare second-dimensional sod- 
ium dodecyl sulfate (SDS) polyacrylamide gradient slab 
gels in which the top 5 % of the gel is 11 %T acrylamide, and 
the lower 95 % of the gel varies linearly from 1 1 % to 18 %T. 

This system has recently been modified so as to employ a 
commercially available 30.8%T acrylamide/yv./V-methyle- 
nebisacrylamide prepared solution (thus avoiding the han- 
dling of the solid acrylBmide monomer) and three addi- 
tional stock solutions: bufTer (made from Sigma pre-set 
Tns), pmuUate and ^^.yv-.tetramethylethvlenedi- 
amine (TEMED). Each, gel is identified by a computer- 
printed filter paper label polymerized into the lower left cor- 
ner of the gel. First-dimensional IEF tube gels are loaded 

" This material (succeeding certified batches ofwhich are available from 
Hoefer Scientific Instruments) has the most linear pH gradient pro- 
dueed by any ampholyte tested except for the Pharmacia wide range 
(which hu an unacceptable tendency to bind high-moleeular weight 
acidic proieins. causing ihcm to streak). * 



Following SDS-electrophoresis, slab gels are stained for 
protetn using a colloidal Coomassie Blue G-250 procedure 

% u 0f i^i ) per b0X - Tn " Procedure (based on the work 
rl, and^K' 31 ? inVOlVeS futati0n in of 50%IS 

mS, i ,?L COld t2P Waier ' and transf « l ° 1-5 L of 34% 
me thano 17% ammonium sulfate and 2 % phosphoric acid 
for 1 h, followed by the addition ofa gram of powdered S 
massie Blue G-250 stain. Staining r^ZZZtilyA 
days ,o reach equilibrium intensity, whereuponTel are 
transferred to cool tap water and theirsurface -rinsed !o Z 
move any paniculate stain prior to scanning. Gel may it 
kept for several months in water with added sodium a2 ide 

2fn7.S ""I" K m ° Ve Cthano1 that would di sSve the 
sum (and render the system noncolloidal, with high back- 
grounds) The concentrated ammonium sulfate ariTmeS- 
i " , » Uon » dlluled b ? equilibration with the wateTvS- 
ume of the gels to automatically achieve the correct final 
concentrations for colloidal staining. tactical^S 

ill inv n in l aP t Pr ° ach an be ^tiarized as follows- (» 
, ' flal / back S" u n«' makes computer evaluation of 
small spots (max OD < 0.02) possible, especially when 

ablv d £«' denS " 0metry; up 10 1S00 .ptls an be rel° 
1 dt . l f c,ed on ma "y ?els (e.g., rat liver) at loadings low 
enough to preserve excellent resolution; and (iii) reTro™. 
ability appears to be very good: at least several nundred 
spots have coefficients of reproducibility less than 15% 
This value ,« a, least as good as previous CBB methoSs and 
significantly better than many silver stain systems ' 

2.4 Positional standardization 

The carbamylated rabbit muscle creatine phosphokinase 
(CPK) standards [32] are purchased from 
dDH. Amino acd compositions, and numbers of residues 
present ,n proteins used for internal standardization a " 
taken from the Protein Identification Resource (PIR) se . 
quence database [33]. ' " 



2.5 Computer analysis 

Stained slab gels are digitized in red light at 134 micron re- 
solution, using either a Molecular Dynamics laser scanner 
(with pixel sampling) or an Eikonix 78/99 CCD scanner 
Raw digutzed gel images are archived on high-denshyDAT 
tape (or equivalent storage media) and a greyscale vSeo- 
print prepared from the raw digital image as bard- opy 

ler software system (produced by LSB), a commercially 
available workstation-based s ftware package ?u it on 
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some of the prinriplcs of the earlier TYCKO system [34- 
41], Procedure PROC00S is used to yield a spotlist giving 
position, shape and density information for each detected 
spot. This procedure makes use of digital filtering, mathe- 
matical morphology techniques and digital masking to re- 
move the background, and uses full 2-D least-squares opti- 
mization to refine the parameters of a 2-D Gaussian shape 
for each spot. Processing parameters and file locations arc 
stored in a relational database, while various log files detail- 
ing operation of the automatic analysis software axe ar- 
chived with the reduced cata.The computed resolution and 
level of Gaussian convergence of each geJ are inspected 
and archived for quality control purposes. 

Experiment packages are constructed using the Kepler ex- 
periment definition database to assemble groups of 2-D 
patterns corresponding to the experimental groups (e.g., 
treated and control animals). Each 2-D pattern is matched 
to the appropriate "master" 2-D pattern (pattern 
F344MST3 in the case of Fischer 344 rat liver), thereby 
providing linkage to the existing rodent protein 2-D data- 
bases. The software allows experiments containing hun- 
dreds of gels to be constructed and analyzed as a unit, with 
up to 100 gels displayed on the screen at one time for com- 
parative purposes and multiple pages to accommodate ex- 
periments of > 1000 gels. For each treatment, proteins 
showing significant quantitative differences vs. appropriate 
controls are selected using group-wise statistical parame- 
ters (e.g., Student's t-test, Kepler* procedure STUDENT). 
Proteins satisfying various quantitative criteria (such as P< 
0.001 difference from appropriate controls) are repre- 
sented as highlighted spots onscreen or on computer-plot- 
ted protein maps and stored as spot populations (i.e.. logi- 
cal vectors) in a liver protein database. Quantitative data 
(spot parameters, statistical or other computed values) are 
stored as real-valued vectors in the database. AnaJysis of co- 
regulation is performed using a Pierson product-moment 
correlation (Kepler procedure CORREL) to determine 
whether groups of proteins are coordinate!)' regulated by 
any f the treatments. Such groups can be presented graphi- 
cally on a protein map, and reported together with the statis- 
tical criteria used to assess the level of coregulation. Multi- 
variate statistical analysis (e.g., principal components* ana- 
lysis) is performed on data exported to SAS (SAS Institute). 

2.6 Graphical data output 

Graphical results are prepared in GKS and translated 
within Kepler* into output for any of a variety of devices. 
Linedrawing output is typically prepared as Postscript and 
printed on an Apple LaserWriter. Detailed maps presented 
here have been generated using an ultra-high-resolution 
Postscript-compatible Linotronic output device. Greyscale 
graphics are reproduced from the workstation screen using 
a Seikosha videoprinter. Patterns are shown in the standard 
dentation, with high molecular mass at the top and acidic 
proteins to the left. 

2.7 Experiment LSBC04 

In the study described here 12-week-old Charles River 
male F344 rats were used. Diets were prepared at LSB, 
based on a Purina 5755M Basal Purified Diet. Lovastatin 
and cholestyramine were obtained as prescription pharma- 



ceuticals, ground and mixed with the diet at concentrations 
of 0.075 % and }% respectivelyTThe high cholesterol diet 
was Purina 5S01M-A (5% cholesterol plus 1% sodium che- 
late in the control diet). Animal work was carried out by Mi« 
crobiological Associates (Bethesda,MD). Animals were ac- 
climatized for one week on the control diet, fed test or con- 
trol diets for one week, and sacrificed on day g. Average 
daily doses of lovastatin and cholestyramine in appropriate 
groups were 37 mg/kg/day and 5 g/kg/day, respectively, 
based on the weight of the food consumed. Liver samples 
were collected and prepared for 2-D electrophoresis accord- 
ing to the standard liver protocol (homogenization in 8 
volumes of 9 m urea, 2% NP-40, 0.5% ditbiothreitol, 2% 
LKB pH 9—13 carrier ampholytes, followed by centrifuga- 
tion for 30 min at 80000 X g). Kidney, brain and plasma 
samples were frozen. Gels were run as described above, 
and the data was analyzed using the Kepler* system. Gels 
were scaled, to remove the efTect of differences in protein 
loading, by setting the summed abundances of a large num- 
ber of matched spots equal for each gel (linear scaling). 

3 Results and discussion 

3.1 The rat liver protein 2-D map 

F3-4MST3 is a standard 2-D pattern of rat liver proteins, 
based on the Fischer 344 strain. This pattern was initiated 
from a single 2-D gel and extensively edited in an experi- 
ment comparing it to a range of protein loads, so as to in- 
clude both small spots and well-resolved representations of 
high-abundance spots. More than 700 rat liver 2-D patterns 
have been matched to F344MST3 in a series of drug effects 
and protein characterization experiments, and numerous 
new spots (induced by specific drugs, for instance) have 
been added as a result. A modified version including addi- 
tional spots present in the Sprague-Dawiey outbred rat has 
also been developed (data not shown). Figure 1 shows a 
greyscale representation and Fig. 2 a schematic plot of the 
master pattern. More than 1200 spots are included, most of 
which are visible on typical gels loaded with 10 uLof solubi- 
lized liver protein prepared by the standard method and 
stained with colloidal Coomassie Blue. Master spot num- 
bers (MSN's) have been assigned to all proteins, and ap- 
pear in the following figures, each showing one quadrant of 
the pattern. Figure 3 shows the upper left (acidic, high 
molecular mass) quadrant. Fig. 4 the upper right (basic, 
high molecular mass) quadrant, Fig. 5 the lower left (acidic, 
low molecular mass) quadrant, and Fig. 6 the lower right 
(basic, low molecular mass) quadrant. The quadrants over- 
lap as an aid to moving between them. The gel position (in 
100 micron units), isoelectric point (relative to the CPK in- 
ternal p7 standards) and SDS molecularmass (from the cali- 
bration curve in Fig. 8) are listed for each spot (Table 1). Be- 
cause of the precision of the CPK-p/ values, these parame- 
ters can be used to relate spot locations between gel sys- 
tems more reliably than using p/ measurements expressed 
as pH. A major objective of cunent studies is the identifica- 
tion of all major spots corresponding to known liver pro- 
teins, as well as rigorous definitions of subcellular orga- 
nelle contents. Of particular interest to us is the parallel de- 
velopment of identifications in the rat and mouse liver 
maps, allowing detailed comparisons of gene expression ef- 
fects in the two systems. The results of these studies will be 
presented systematically in a later edition of this database, 
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cations as an aid to other users of the rat liver pattern (Table 

3 2 Carbamtfaud charge standards, computed p/s and 
molecular mass standardization 

We have previously shown that the use of a svsien ofcIc«. 

ly-spaced internal p/ markers (made bv caVbErvHir. . 

basic protein) offers an accurate and workable St 

ih. problem of assigning positions in the p/^ ' ' r ? p ? 

The same system, based on 56 protein species nfad b • -' 

bamylating rabbit muscle CPK, has been usee here to « 

s.gn p/s to most rat liver acidic and neutral P ro ein< Tne* 

tandards were coelectrophoresec with total liver Fro «ins 

and the standard spots added to a s P£ci£l version of he 

master pattern F344MST3. The gel *coordiww, of aI 

liver protein spots lying within the CPK charee ir-ir IM 

then transformed into CPK p/ positions bv iniercolctfon 
between the ^ f immeci£ldy ^ *J>.uon 

(Table 1) using a Kepler 8 vector procedure. 

It has proven possible to compute fairly accurate p/ value, 
for many proteins from the amino acid composition 421 
We have attempted here to test a further elaborating 
approach, in which we computed p/s f 0 ,h CPK SJr /. h 
themselves, based on our k»w,eV^ 
CPK sequence and the fact that adjacent member, of , J 

sine rescue (Table 3). We compared these 
computed pFs foran additional set of carbamvla ted und- 
ards made from human hemoglobin beta chains and a se 
nes of rat liver and human plasma proteins of known posi" 

;«n - ' V " T8b,e 4) and protein ^'sulphide isomer^ 
£20 in the table). The FABP spot present « fSSmsS 
may represent a charge-modified version of a more basic 

1eB!& 01 C °nV l ° lhc , tx ^ PA not resoWeTin he 
1EF/SDS gel. Of particular importance is the fact that bv 
comparing computed p/"s of sequenced but unloca ed oro 
terns with the CPK p/s. we can assign a probabl gel iSca" 
t.on without making any assumptions regarding the acS 
gel pH gradient. This offers a useful shoncu. ,5? the "a 
gar.es of pH measurement on small diameter 1EF gels We 
have used this approach to compute the CPK p/s of all rat 
and mouse proteins in the PIR sequence databa^ « 11! ! \ 
to protein identification (data not siown). 



^ „ OiuliMe of m liver prmeiw 9]] 

molecuto mS 8 °° <l 0,e " «»« of 

lor of HMG-CoA niuSuZ-^T """Ko^." lobibi- 

S«k C xp,^to„' co^o^ I £ "LIS 3 ? " ° f "'V™ 
«n «p.rin«m offer, „ °«° 0 f,h= 2 Z'' ^ 

sine DM1 of p a .hJ,7S ™ \ fpS'X 

hv - , i ^ ent >-°ne proiems were found to be affects 



In orderto standardize SDS molecular weight (SDS-MWi 
we have used a standard curve fated to a series ofSenUfied 
protems (F,g. 8). Rather than using molecular mas p^L 

S^,-5 eC, ?- t0 nUffibe ' of amino acids fn he 

polypeptide chain, as perhaps a better indication of h* 
length of the SDS-coated rod that is sieved by the second 
dirnens, n slab. The resulting values were muitip ied bv 
112 (the weighted average mass of amino acids in se- 
quenced proteins) to give predicted molecular masses Be- 
cause we use gradient slabs, we have no. constraSed h e fi - 
led curve to conform to any predetermined model; rather 
we tried many equations and selected the best us n« the 
program "Tablecurve"on a PC. The equation cho« n was ! 
= fl+ ^ + c/r\where,isthenumberofresidues.°?sth^ 



-^rglS^-H ~™ 
ScSS^ 

fied as the cvto sol HmS-S a 1 l T ,at ' Ve,y idenli - 
".ones. Th.s enzyme lies immediate? Tefor HMG CoA 

r=;o"^^;rrn?^e:i?~ 

qu'enc of fh. , LV ' com P u,ed ''on the known se- 
quence oi the hamster enzyme |43]. 

Using a classical product-moment correlation test fKeni., 

found t re h C ° RRE , L) - 3 S£rieS 0f f,ve addition S spms SL 
found to be coregulated with 413. The level of correlation 

- as exceedmgly high p>95%).Tw 0 of these, 12 5 u Tn d 9 ™ 
are a. stm.lar molecular w e i ghls and approximately ne ' 
charge more acidic than 413 (Fig. 9) indicating, ^.k 
may be alently modified fori of i^JlSpffi 
This suspicion is strengthened by the observation 
spots are also stained by the antibody to 7ytosSi c HMG 
CoA syn.hase.The remaining three co«ta2E p °o£ a^eaV 
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to comprise an addiiional related pair (1253 and 1001) of 
around 40 kDa and a single spoi (1119) of around 21 kDa. 
Because these two presumed proteins are present zi sub- 
stantially lower abundances than A 13, and because the cyto- 
solic HMG-CoA synthase is reported to consist of only one 
tvpe of polypeptide they are likely to represent other, very 
tightly coregulated enzymes. A second group of six spots 
was selected based on a regulatory pattern close to the in- 
verse of that for spot 4 13 (MSN's 34,79, 176, 182,204, 347: 
data not shown). For these proteins, the lowest level of ex- 
pression occurs with exposure to lovastatin plus cholestyra- 
mine and the highest level upon exposure to the high-cho- 
lesterol diet. Spots 182 and 79 are highly correlated and lie 
about one charge apart at the same molecular weight; they 
may thus be isoforms of a single protein. The other four 
spots probably represent addiiional enzymes or subunits. 

332 MSN 235 and coregulated spots 

A third group of five spots, mainly comprised of mitochon- 
drial proteins including putative mitochondrial HMG- 
CoA synthase spots, showed a modest induction by lovasta- 
tin alone, but little or no effect with any of the other treat- 
ments (including the combination of lovastatin and choles- 
tyramine; Fig. 12).This result is intriguing because lovasta- 
tin was expected to affect only the regulation of enzymes of 
cholesterol synthesis, which is entirely extra-mitochon- 
drial. Three of the spots (235, 134, 144) form a closely- 
packed, triad at approximately 30 kDa, and are likely to re- 
present isoforms of one protein. All three spots are stained 
by an antibody to the mitochondrial form of HMG-CoA 
synthase obtained from Dr. Greenspan. Subcellular fractio- 
nation indicates a mitochondrial location. The other two 
spots (633 at about 38 kDa and 724 at about 69 kDa) are 
each present at lower abundance than the members of the 
triad. 

33.3 An example of an aoti-synergistic effect 

A sixth spot (367) shows strong induction by lovastatin 
(two- to threefold), and about half as much induction with 
lovastatin plus cholestyramine, but without sharing the ani- 
mal-animal heterogeneity pattern of the 235-set (Fig. 13). 
This protein is also mitochondrial, and represents the clear- 
est example of an anti-synergistic effect of lovastatin and 
cholestyramine. The existence of such an effect demon- 
strates that lovastatin and cholestyramine do not act exclu- 
sively through the same regulatory pathway. 

33.4 Complexity of the cholesterol synthesis pathway- 
Taken together, these results suggest that treatment with lo- 
vastatin alone can afTect both cytosolic and mitochondrial 
pathways using HMG-CoA, while cholestyramine, on the 
other hand, either alone or in combination with lovastatin, 
produces a strong effect on the putative cytosolic pathway, 
but little or no effect on the putative mitochondrial path- 
way. An explanation for this difference may lie in lovasta- 
tin's efTect on levels of HMG-CoA and related precursor 
c mpounds that are exchanged between the cytosol and 
the mitochondrion, whereas cholestyramine should affect 
only the cyt solic pathways directly controlled by cholester- 
ol and bile acid levels. It remains to be explained why some 



proteins of the putative mitochondrial pathway are so 
much more variable in their expression in ail groups. An ex- 
amination of all the coregulated groups suggests that quan- 
titative statistical techniques can extract a wealth of inter* 
esting information from large sets of reproducible gels. The 
abundance cf spots in the 4 13 coregulation group, for exam- 
pie, shows an amazing level of concordance in their relative 
expression among the five individuals of the lovastatin and 
cholestyramine treatment group. This effect is not due to 
differences in total protein loading. since they have already 
been removed by scaling, and since proteins with quite dif- 
ferent regulation patterns can be demonstrated (e.g., Fig. 
13). Such effects raise the possibility that many gene coregu- 
lation sets may be revealed through the study of a suffi- 
ciently large population of control animals (i.e., without 
any experimental manipulation). This approach, exploiting 
natural biological variation in protein expression instead of 
drug effects, oilers an important incentive for the construc- 
tion of a large library of control animal patterns. 



4 Conclusions 

Eecause of the widespread use of rat liver in both basic bio- 
chemistry and in toxicology, there is a long-term need for a 
comprehensive database of liver proteins. The rat liver mas- 
ter pattern presented here has proven to be an accurate re- 
presentation of this system, having been matched to more 
than 700 gels to date. As the number of proteins identified 
and the number of compounds tested for gene expression 
effects grows, we expect this database to contribute valu- 
able insights into gene regulation. Its practical utility in sev- 
eral areas of mechanistic toxicology is already being de- 
monstrated. 

Received September 11, 1991 
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Figun I. Synthetic representation of the standard rat livtt 2-D master pattern, rend 



ered as a greyscale image using a videoprinter. 
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Figure 3. Upper left (bigb molecular weight, acidic) quadrant (#1, of the rat liver map, showing spoi numbers. 
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figure 4. Upper righi (high molecular weight, basic) quadrant <W2) of ihe rat liver map, showing spoi numbers. 
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Gel Y Coordinate 

Figures. Plot of number of amino acids versus gel K-pos it ion, with fi tied 
curve used to predict molecular mass of unidentified proteins. 
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4 Figure Z (a) Plot of computed isoelectric point versus gel X-position for 
two sets of carbamylaied standard proteins (rabbit muscle CPK M *nd 
human hemoglobin fl chain, filled diamonds) and several other proteins 
(shaded squares), (b) The identities of the various proteins represented 
by the squares are indicated by the numbers in corresponding positions 
on (a); these refer to Table 4. 
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f/>wff P. Montage showing effects in the 
region orMSN:413. The montage shows a 
small window into one portion of the 2-D 
pattern, one row of windows for each expe- 
rimental group, and one panel for each gel 
in the experiment. The left-most pattern 
in each row is a group-specific copy of the 
master pattern followed by the patterns 
for the five individual rats in the group. 
The highlighted protein spots (Oiled circ- 
les) are spot 413 (on the rignt of each pan- 
el; identified as cytosolic HMG-CoA syn- 
thase) and two modified forms of it (1250 
and 933). From the top, the rows (experi- 
mental groups) are: high cholesterol, con- 
trols, cholestyramine, lovastalin, and lova- 
siatin plus cholestyramine. 
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Figure IQ. Bargraph showing the quantita- 
live effects of various treatments on the 
abundance of MSN:413 (cyiosolic HMG- 
CoA synthase) in the gels of Fig. 9. 




Figure 11. Bargraphs of a series of six core- 
gulatcd spots including MSN:413. In the 
bargraphs, the abundances of the appro- 
priate spot (master spot number shown at 
the top of the panel) in each animal are 
shown. The five uve-ammal groups are in 
the order (left to right): high cholesterol, 
controls, cholestyramine, lovasutin, and* 
lovastatin plus cholestyramine. Each bar 
within a group represents one experimen- 
tal animal liver (one 2-D gel). Note the cor- 
related expression of the 6 spots, espe- 
cially in the two far right (most strongly in- 
duced) groups. 
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367 




Figure IS. Data on spot MSN:367, presented as in Fig. U.This protein 
shows unambiguously the anti-synergistic effect of lovastalin and choles- 
tyramine (fifth group) as compared to lovastalin (fourth group). This res- 
ponse contrasts strongly with the regulation pattern seen in Fig. 11. 



7 Addendum 2: Tables 1-4 

"Table"!. Master uble of proteins Irflhe rat liver database*' 



MSN 


X 


3 


311 


5 


566 


8 


ei2 


11 


549 


IS 


645 


17 


629 


16 


906 


19 


755 


20 


649 


21 


1204 


22 


332 


23 


787 


24 


313 


25 


807 


27 


1184 


28 


1263 


29 


743 


30 


768 


32 


1216 


33 


1145 


34 


1037 


35 


8G3 


36 


712 


38 


763 


39 


304 


41 


1165 


42 


684 


43 


1318 


44 


1624 


46 


1203 


47 


1391 


46 


309 


49 


605 


50 


621 


51 


1113 


52 


1620 


53 


725 


54 


2001 


55 


722 


56 


678 


57 


1682 


58 


1091 


59 


1171 


GO 


1400 


61 


1853 



Y CFKDl SOSMW 



MSN 



Y CFKDl SOSMW 



MSN 



62 1888 

65 735 

66 1263 

67 1252 

68 779 

69 1064 



434 
263 
426 
266 

520 
589 
414 
298 
403 
448 
434 
424 
417 
516 
524 
446 
605 
112 
417 
445 
555 
412 
606 
694 
470 
569 
607 
589 
362 
586 
447 
454 
567 
535 
522 
499 
177 
500 
830 
533 
302 
580 
585 
624 
508 
567 
297 
312 
407 
682 
296 



71 


656 


589 


72 


636 


545 


73 


1582 


563 


74 


1570 


556 


75 


1264 


621 


76 


1338 


564 


77 


1633 


363 


78 


1767 


565 


79 


925 


738 


80 


534 


698 


81 


1811 


363 


82 


1412 


681 


83 


1471 


347 


64 


1662 


563 


85 


1596 


479 


86 


1817 


301 


87 


516 


1371 


88 


1569 


698 


89 


1706 


719 


90 


651 


329 


91 


1415 


710 


G2 


1773 


545 


S3 


1338 


446 


94 


1708 


696 



<-35.0 
-24.3 
-16.0 
-25.2 
-15.3 
-21.6 

•14.0 

-17.5 
-20.9 
-6.7 
<-35.0 
•16.6 
<-35.0 
-16.1 
-9.0 
-8.0 
-17.8 
-17.2 
-8.6 
-9.5 
-11 J 
•14.9 
-18.7 
-17.3 
<-35.0 
-9.2 
-19.6 
-7.3 
-0.1 
-8.7 
-6.3 
«-35.0 
•22.5 
-21.8 
•10.0 
-0.9 
-18.3 
>0.0 
-18.4 
•19.8 
-2.5 
-10.3 
-9.2 
-6.2 
-0.6 
•0.4 
-16.1 
•6.0 
-6.1 
-16.8 
-10.8 
-20.6 
-21.2 
-3.6 
•3.6 
-8.0 
-7.0 
4.8 
-1.5 
-13.6 
-26.1 
-1.0 
-6.0 
-5.0 
-2.7 
-3.4 
-0.9 
-27.0 
-3.5 
•2.2 
•20.6 
-6.0 
-1.4 
-7.0 
•2.2 



63.800 
102.900 
64.800 
101.000 
55.200 
50.000 
66.300 
90.200 
67.900 
62.100 
63.800 
65.000 
66.000 
55.500 
54.900 
62.400 
49.000 
346.600 
66,000 
62.500 
52.400 
66,600 
46.900 
43.800 
59.800 
51,400 
4£ ( 800 
50.000 
74,600 
50.200 
€2.300 
61,500 
50,100 
53.000 
55.000 
57,000 
170.800 
56,900 
37,300 
54,100 
8S.000 
50.600 
50.300 
47.800 
56.200 
51,500 
90,500 
85.900 
67.300 
43.900 
90,800 
50.000 
53.100 
50.400 
52,300 
46.000 
51.800 
74.400 
51.700 
41,600 
43,600 
74,500 
44,500 
77.500 
51,800 
56.900 
89.100 
17,400 
43.600 
42,500 
81.700 
43.000 
53.200 
62.300 
43.700 
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95 
96 
97 
98 
99 
100 
101 
102 
103 
104 
105 
106 
107 
108 
109 
110 
111 
113 
114 
115 
116 
117 
118 
120 
121 
122 
123 
124 
125 
126 
127 
128 
129 
130 
131 
132 
133 
134 
135 
136 
137 
138 
139 
140 
141 
142 
143 
144 
145 
146 
147 
148 
149 
150 
151 
152 
153 
154 
155 
156 
157 
158 
159 
160 
161 
162 
164 
166 
167 
168 
169 
170 
171 
172 
173 



1119 
1731 
1033 
1406 
£78 
2004 
1106 
462 
665 
773 
312 
1769 
1565 
1662 
1482 
778 
1728 
1191 
1296 
682 
1146 
1546 
1050 
1530 
636 
1572 
23 
€21 
1296 
672 
1000 
1229 
U22 
1776 
1930 
660 
666 
1271 
1161 
453 
1658 
1504 
1466 
1689 
311 
1366 
1429 
615 
2006 
2006 
1070 
1347 
541 
1645 
1269 
1507 
1722 
932 
1031 
1970 
1258 
1275 
1663 
1034 
1953 
1020 
1566 
1905 
1340 
1506 
1338 
1969 
800 
476 
919 



536 
756 
566 
565 
1149 
538 
623 
455 
630 
1162 
1117 
509 
720 
807 
593 
516 
700 
660 
185 
907 
610 
648 
577 
62S 
423 
712 
1433 
1474 
862 
921 
717 
311 
832 
409 
757 
537 
1019 
862 
1369 
1063 
623 
697 
707 
756 
1417 
915 
346 
1017 
566 
516 
1106 
578 
1461 
760 
236 
911 
446 
503 
294 
664 
163 
417 
620 
527 
771 
1462 
606 
565 
161 
563 
678 
541 
378 
958 
1314 



-9.9 
-2.0 
-11.4 
-6.1 
-23.8 
>0.0 
-10.1 
-28.5 
-20.2 
•17.0 

<-35.0 
-1.5 
-3.6 
-2.4 
-4.8 

-16.9 
-2.0 
-6.9 
-7.5 

-19.6 
-9.5 
-4.1 

-11.1 
-4.3 

-15.4 
-3.8 
«-35.0 

-21.9 
-7.5 

-14.7 

-12.0 

-6.4 

-5.6 

-1.4 

-0.1 
-20.4 
-20.2 

-7.9 

-9.3 
•29.7 

•0.6 

-4.6 



-2.4 
<-35.0 
-6.7 
-5.7 
-22.1 
>0.0 
>0.0 
•10.7 
-6.9 
•25.7 
-28 
-7.9 
-4.5 
-2.1 
-13.5 
•11.4 
>0.0 

-e.i 

-7.8 
•2.6 
•11.4 
>0.0 
-11.6 
•3.8 
-0.2 
-7.0 
-4.6 
-7.0 
>0.0 
•16.3 
-28.7 
•13.7 



53.800 
40.700 
51.600 
51.700 
25.000 
53,700 
47.900 
61,300 
37,300 
23,800 
26.100 
56,100 
42.500 
36.300 
49.700 
55.50C 
43.500 
44.500 
160.800 
34.100 
46.700 
36,500 
50.800 
37,400 
65.200 
42.000 
15.300 
13.900 
36.00C 
33.S0C 
42.60C 
86,100 
37,3a 
57,000 
40.70C 
53,800 
29,700 
36,000 
16,800 
26.100 
37.700 
43.70C 
43.20C 
40.700 
15.800 
33,800 
77,900 
29.800 
51,600 
55.300 
26.500 
50.800 
13.700 
40.500 
117.000 
33,900 
62.100 
56.600 
91.400 

44.400 

162.400 

65.900 

37.800 

54.600 

40,000 

13.700 

38.400 

51,700 
164,900 

50,400 

44.700 

53,500 

71.800 

32.100 

19.300 



» Master uble of proteins io the rat liver database, showing spot master numbered position U and y) 
predicted molecular mass (from the standard curve of Fig. 8). P«»'°n KXMOyh 



174 

175 
177 
178 
179 
180 
181 
162 
184 
185 
186 
187 
188 
191 
192 
193 
194 
195 
196 
197 
196 
199 
200 
201 
202 
203 
204 
205 
206 
207 
208 
210 
211 
213 
214 
215 
216 
217 
218 
219 
220 
221 
223 
225 
226 
227 
228 
229 
230 
232 
234 
235 
236 
237 
238 
239 
240 
241 
242 
243 
244 
245 
246 
247 
246 
249 
250 
251 
252 
253 
254 
255 
256 
257 
256 



1364 
825 
1582 
1321 
1069 
1866 
411 
804 
1860 
1997 
279 
773 
1538 
1560 
1618 
1469 
1380 
784 
1227 
667 
2006 
1711 
872 
292 
736 
786 
1224 
439 
1994 
1895 
240 
1700 
902 
1087 
1340 
1591 
1585 
1159 
931 
713 
1479 
965 
934 
1612 
621 
1586 
1065 
1577 
1456 
1440 
1692 
616 
920 
952 
1611 
1469 
501 
1820 
1357 
711 
1855 
1189 
551 
1346 
460 
1733 
1974 
806 
874 
753 
995 
1690 
994 
508 
1517 



183 
393 

553 
710 
615 
567 
295 
730 
896 
1017 
1113 
296 
807 
674 
687 
555 
266 
632 
1185 
553 
681 
674 
424 
435 
253 
629 
589 
983 
571 
667 
1416 
400 
517 
684 
668 
405 
755 
393 
572 
177 
011 
027 
716 
1045 
411 
1483 
567 
890 
496 
649 
469 
1004 
1138 
1006 
541 
720 



569 
656 
1162 
621 
474 
459 
604 
446 
451 
788 
392 
553 
646 
450 
679 
1006 
464 
620 



•6.7 
•15.7 
-3.6 
-7.2 
•10.4 
4.5 
-32.1 
-16.2 
-0.6 
>0.0 
<-35.0 
-17.0 
-4.2 
-3.9 
-0.9 
-5.0 
-6.4 
-16.7 
-6.4 
-20.1 
>0.0 
-2.2 
-14.7 
<-35.0 
-18.0 
•16.7 
-8.5 
•30.9 
>0.0 
-0.3 
c-35.0 
-2.3 
-14.1 
-10.4 
-7.0 
-3.5 
-3.6 
-9.3 
-13.5 
-18.7 
-4.9 
-12.8 
-13.5 
•1.0 
-15.8 
•3.6 
-10.8 
-3.7 
•5.2 
•5.5 
-2.4 
-22.0 
-13.7 
-13.1 
•3.2 
-4.6 
•27.7 
-0.9 
-6.8 
-16.7 
-0.6 
-6.9 
-25.1 
-6.9 
-29.3 
-1.9 
>0.0 
-16.1 
-14.6 
-17.6 
-12.1 
-2.4 
•12.1 
-27.4 
-44 



162.900 
69X0 
52.600 
43.000 
48.300 
51.600 
91.200 
42.000 
34.500 
29,800 
26.300 
90.800 
38,400 
44,000 
44.200 
52.400 
101.600 
47.300 
23.700 
52.600 
44.500 
44.000 
65.000 
63,700 
107.800 
37.400 
50.000 
31,100 
51,300 
44.200 
15.800 
57,000 
55,400 
44,400 
45.200 
57.300 
40.700 
69.300 
51.200 
170.500 
33.900 
33,300 
42,700 
28.800 
66.800 
13.600 
51.600 
34,600 
57.300 
36.500 
57,000 
30,300 
25.400 
30.200 
53.500 
42.500 
62.100 
51.400 
45.800 
23.800 
46.000 
59,300 
61.000 
49.100 
62.100 
61,800 
39.200 
69.500 
52.500 
36,500 
61.000 
44.600 
30,200 
60,400 
37.J 



isoelectric point relative to CPK standa/ds.and 
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Zdv 


1 TWO 


HI 


-1 i 


31 MA 


260 


661 


19di 


Otk A 


17 7AA 


261 


1725 


679 




44.000 


262 


496 


1127 


-26 .0 


2S..BO0 


263 


1063 


172 


-1 o.v 


1 77 ,*GO 


265 


1390 


673 






266 


510 


437 


-27.3 


63,400 


267 


660 


1038 


-20.4 


29,000 


268 


430 


961 


-31.0 


31,900 


260 


1044 


606 


•11.2 


48,900 


270 


2019 


853 


>0.0 


36,300 


271 


857 


422 


-15.0 


65,200 


272 


895 


968 


-14.2 


31,700 


274 


1292 


712 


-7.6 


42.900 


275 


1350 


590 


-6.9 


49.900 


276 


1670 


1089 


•2.6 


27,100 


277 


668 


538 


-19.4 


53.700 


278 


961 


718 


-13.0 


42.600 


278 


E79 


570 


-14.5 


51.300 


281 


1648 


1064 


-0.7 


27,300 


282 


1505 


525 


-4.6 


54.800 


283 


1313 


1147 


-7.3 


25,100 


284 


1314 


829 


•7.3 


37.400 


285 


1332 


408 


-7.1 


67.200 


286 


1277 


652 


-7.8 


46,100 


288 


1391 


824 


-6.3 


37.600 


289 


1147 


579 


-9.5 


50.700 


290 


825 


511 


•13.6 


55.900 


291 


787 


1476 


•16.6 


13.900 


2S2 


1462 


816 


-5.1 


37.800 


293 


531 


449 


-26.3 


62.000 


294 


860 


698 


-14.9 


43.600 


295 


1162 


609 


-9.3 


46.700 


296 


218 


814 


<-35.0 


36,000 


297 


1377 


979 


-6.5 


31,300 


299 


913 


1523 


-13.9 


12,400 


300 


2012 


667 


>o.o 


45.300 


301 


702 


178 


•19.0 


169.200 


302 


494 


1280 


-28.1 


20.400 


303 


403 


1008 


-32.6 


30.100 


304 


1843 


1585 


-0.7 


10,300 


305 


1049 


503 


-11.1 


48.800 


306 


1606 


989 


-3.3 


30,900 


307 


1219 


916 


-8.5 


33.700 


306 


1627 


755 


-3.0 


40.700 


309 


1524 


892 


•4.4 


34,700 


310 


1769 


1028 


•1.5 


29,400 


311 


1609 


1451 


-3.3 


14,700 


312 


266 


1406 


<-35.0 


16,100 


313 


1902 


1365 


-0.3 


17.600 


314 


1316 


1395 


-7.3 


16.600 


315 


1341 


523 


-7.0 


54.900 


318 


1104 


1053 


-10.1 


26.500 


320 


1480 


1459 


-4.9 


14.400 


321 


850 


603 


-15.1 


49.100 


322 


1454 


1494 


•5.3 


13,300 


323 


670 


626 


•20.0 


47,700 


324 


655 


101 


-20.6 


420,500 


325 


1521 


675 


-4.4 


44.800 


326 


1567 


677 


-3.6 


44,700 


327 


1388 


409 


-6.3 


67,000 


328 


448 


1291 


•30.0 


20,100 


330 


1608 


751 


•3.3 


40.900 


331 


1566 


697 


•3.8 


43.700 


332 


531 


471 


-26.3 


59.600 


333 


784 


1156 


-16.7 


24,700 


334 


1059 


407 


-10.9 


67.300 


335 


1593 


303 


-3.5 


86.500 


336 


1616 


598 


•3.2 


49,400 


338 


1854 


1004 


-0.6 


30.300 


339 


1265 


888 


-8.0 


34.900 


340 


581 


585 


•23.6 


50.300 


341 


1497 


1047 


-4.7 


26.700 


343 


1351 


265 


-6.8 


102.200 


344 


1613 


549 


-0.9 


52.800 





- 








MSN 


x 


Y 


CPKpl 


SDSMW 


345 


1006 


578 


-11.9 


50.800 


346 


1095 


640 


-10.3 


46,800 


347 


625 


728 


-21.7 


42.000 


348 


361 


963 


-35.3 


31.100 


349 


no 


1343 


«-35.0 


16.300 


350 


521 


1130 


-26.7 


25.700 


351 


912 


619 


•13.9 


48.100 


352 


1574 


530 


•3.7 


54.300 


353 


961 


912 


•12.9 


33,900 


354 


706 


762 


-1E.9 


40,400 


355 


1450 


630 


-S.3 


37,300 


356 


1374 


1152 


-€.5 


24.900 


357 


474 


9S7 


-26.7 


30.600 


358 


796 


346 


•16.3 


77.800 


359 


764 


338 


-17.3 


79.400 


360 


1384 


1068 


-6.4 


27.900 


361 


1713 


769 


-2.1 


40,100 


362 


1161 


659 


-9.3 


36.100 


363 


814 


1156 


-13.8 


24.800 


364 


412 


435 


-32.0 


63.700 


365 


741 


486 


-17.9 


56.200 


366 


678 


1503 


-14.6 


13,000 


367 


1560 


935 


-3.9 


33.000 


368 


963 


520 


-12.4 


55.200 


369 


434 


441 


-31.0 


63,000 


370 


639 


610 


-21.2 


48,700 


371 


1567 


660 


-3.6 


36.100 


372. 


1875 


762 


-0.5 


40,400 


373 


1351 


1059 


-6.fi 


28,300 


374 


1506 


715 


-4.6 


42.700 


375 


1823 


532 


-0.9 


54.200 


376 


254 


417 


<-35.0 


65.900 


377 


1409 


583 


-6.1 


50.400 


378 


621 


494 


-21.8 


57,500 


379 


1017 


595 


-11.7 


49.600 


381 


953 


596 


-13.1 


49,400 


382 


es6 


674 


-15.0 


44,900 


383 


1252 


258 


•6.1 


105.300 


384 


1699 


1518 


•2.3 


12.500 


365 


1042 


493 


-11.2 


57.500 


386 


1490 


563 


-4.7 


50.400 


367 


1554 


603 


-4.0 


49,100 


388 


1193 


404 


•8.9 


67.700 


389 


1374 


902 


-6.5 


34,300 


390 


1456 


969 


-5.2 


31.700 


391 


718 


690 


-18.5 


44.000 


392 


1799 


732 


-1.1 


41,900 


393 


1482 


758 


-4.8 


40.600 


394 


1227 


1461 


-84 


14.400 


395 


1530 


577 


-O 


50.800 


396 


1410 


755 


-6.0 


40.800 


397 


912 


256 


-13.9 


106,400 


399 


1465 


1063 


•5.0 


2fi.!00 


400 


1473 


450 


-4.9 


61,900 


401 


1029 


1140 


-11.5 


25,300 


403 


1516 


754 


-4.4 


40,800 


404 


1495 


554 


-4.7 


52.500 


405 


1525 


1092 


-4.3 


27,100 


406 


723 


252 


-18.4 


108,000 


409 


650 


663 


•20.8 


45.500 


410 


1501 


478 


-4.6 


5S.000 


411 


536 


1057 


-13.4 


28.300 


412 


350 


1120 


-35.9 


26.000 


413 


1033 


538 


-11.4 


53.700 


415 


737 


425 


■18.0 


64,900 


416 


1578 


606 


•3.7 


46.900 


417 


646 


496 


•21 .0 


57.300 


418 


1695 


482 


•2.3 


56,600 


419 


725 


770 


•18.3 


40.000 


420 


1289 


1041 


-7.7 


2e,900 


421 


1171 


912 


-9.1 


33.900 


422 


599 


162 


•22.8 


193.700 


423 


929 


856 


-13.6 


36.200 


424 


739 


625 


-17.9 


47.700 


425 


1490 


965 


-4.7 


31.800 



tltaropkottm 1991. A?, 90T-WO 



MSN 




Y 


CPKDl 


SDSWW 


«<D 


1296 


704 


-7.© 


43.300 


AS1 
*«/ 


BIO 


843 


-16.0 


36.800 


•« 


1S65 


303 


*3.9 


86.700 


4a 


1259 


847 


-8.0 


36.600 


*oO 


1253 


562 


■6.1 


51.900 


4J| 


734 


1426 


-18.1 


15.500 


AW 


463 


433 


-28.5 


63.900 


434 


516 


1041 


-26.9 


26.900 


435 


1020 


1170 


-11.6 


24.300 


436 


1122 


196 


-9.8 


147.600 


437 


1870 


673 


-0.5 


45.000 


438 


435 


1102 


-31.0 


26.700 


439 


86 


847 


<-35.0 


36.600 


440 


1740 


544 


-1.8 


53.200 


441 


S99 


1571 


-22.8 


10.800 


443 


743 


335 


-17.8 


80.100 


446 


801 


668 


-16.2 


45.200 


447 


1050 


926 


-11.1 


33.300 


448 


1245 


1298 


-8.2 


19.800 


449 


1576 


1516 


-3.7 


12.600 


450 


1818 


1021 


-0.9 


29.600 


451 


1094 


440 


-10.3 


63.100 


452 


1945 


802 


>0.0 


38.600 


453 


1652 


894 


-2.8 


34.600 


454 


1403 


500 


-6.1 


56,900 


456 


1394 


718 


-6.3 


42.600 


457 


905 


436 


-14.0 


63.500 


459 


1038 


581 


-11.3 


50.500 


460 


1598 


294 


-3.4 


91,400 


461 


1528 


863 


-4.3 


35.900 


462 


1098 


1137 


-10.2 


25.400 


463 


849 


1125 


-15.2 


25.800 


464 


1814 


1072 


-0.9 


27.800 


465 


1388 


481 


-6.3 


58.700 


466 


1194 


1084 


-8.9 


27.300 


468 


577 


467 


•23.9 


60.100 


469 


1140 


888 


-9.6 


34.900 


470 


1797 


524 


-1.1 


54.800 


471 


1293 


1133 


-7.6 


25.500 


472 


618 


655 


-21.9 


46.000 


473 


2009 


299 


>0.0 


89.900 


474 


1205 


215 


-8.7 


131.300 


475 


1035 


788 


-11.4 


39.200 


476 


160 


155 


<-3S.O 


207.600 


477 


469 


1370 


•28.9 


17.400 


478 


599 


662 


•22.8 


45.600 


479 


1009 


540 


•11.8 


53,500 


480 


1216 


235 


-8.6 


117.400 


482 


616 


346 


•15.9 


77.800 


483 


693 


673 


-19.3 


44,900 


485 


1608 


1013 


-3.3 


30.000 


486 


478 


599 


-28.6 


49,300 


487 


1025 


607 


-11.5 


48.800 


488 


1045 


1186 


-11.2 


23.700 


489 


1609 


301 


-3.3 


89.200 


490 


775 


1269 


-17.0 


20,100 


491 


692 


178 


-19.3 


169,300 


492 


1100 


964 


-10.2 


31,800 


493 


1760 


776 


•1.6 


39.700 


494 


882 


247 


-14.5 


110.700 


495 


470 


1258 


-28.9 


21.200 


496 


494 


1436 


•28.1 


15.200 


497 


980 


652 


-12.5 


36.400 


499 


1414 


546 


-6.0 


53.100 


500 


1234 


1072 


-8.3 


27,800 


501 


1246 


659 


-8.2 


45.700 


502 


824 


792 


-15.7 


39.000 


503 


1246 


1134 


-8.2 


25.500 


504 


1115 


1407 


•9.9 


16.200 


505 


1189 


391 


-8.9 


69.700 


506 


1578 


402 


•3.7 


68.000 


507 


787 


250 


-16.6 


109.000 


508 


979 


552 


•12.5 


52.600 


509 


1153 


619 


•9.4 


48.100 


510 


1730 


1006 


-2.0 


30.200 



CltanpkorniM 1991. 12 % 907-930 



DaubAM of rat liver proteins 



925 



MSN 



Y CFKoJ SCSMW 



MSN 



V CPKol SDSMW 



MSN 



511 800 464 

512 1099 533 
£13 1606 1034 

514 046 636 

515 461 543 

516 1334 1 044 

517 866 1021 

518 708 779 

519 E22 670 

520 632 165 

521 1332 630 

522 603 1104 

523 1100 309 

524 479 1226 

525 768 1066 

526 747 1 016 

527 1170 231 
526 1 502 542 
530 1728 620 

532 507 1011 

533 870 486 

534 1347 1065 

535 1513 346 



536 306 

538 ie51 

539 1463 

540 909 

541 625 

542 1164 

543 803 



544 1259 1143 

545 656 1526 

546 803 1071 

547 1162 274 
546 128 1321 

549 1355 1122 

550 595 666 
552 1369 494 
£53 992 405 
555 1125 410 
£56 705 975 

557 1477 1030 

558 960 563 

559 700 1109 

560 1028 621 
562 698 794 

564 789 1446 

565 777 766 

566 980 328 

567 1519 611 

569 1212 661 

570 760 594 

571 618 956 
£73 1142 771 
£74 532 767 
575 771 250 
£76 1066 534 

577 822 734 

578 914 754 

579 1064 794 

560 1524 714 

561 1362 763 

562 962 666 

564 1467 672 

565 756 731 

566 667 1152 

567 930 523 
566 1866 774 
569 642 465 
590 1317 519 



•16.0 56,400 

•10.2 54,100 

£3 25.200 

-13.2 47.100 

•2e.5 53.400 

•7.1 26.800 

•14.8 29.700 

•16.3 39,600 

•15.7 45.100 

•21.5 ie9,000 

-7.1 37,300 

•22.6 26,600 

-e.9 86.800 

-26.6 22.300 

•17.2 26,000 

-17.7 25,800 

-9.2 116,600 

-4.6 53,400 

-2.0 46,000 

-27.4 30,000 

-14.7 £7,900 

-6.9 27.300 

-4.5 77,800 



654 <-35.0 46,000 
669 -0.7 44,100 
962 -5.1 31,100 
561 -13.6 52.000 
2B9 -21.7 93.100 
196 -9.2 146,200 

655 -16.2 4£.9O0 



-6.0 25,200 

-15-0 12.200 

•16.2 27,800 

-6.3 96,400 

<-3S.O 19.000 

-6.6 25,900 

-23.0 35,800 

-6.6 57.500 

-12.2 67.600 

•9.8 66.900 

•18.9 31,400 

-4.9 25.300 

•12.5 50.400 

•19.1 26.400 

-11.5 46,000 

-14.1 36,900 

•16.6 14,900 

•16.9 4G.200 

•12.5 61,900 

-4.4 46.600 

•6.6 45,600 

-17.4 4S.700 

-21.9 32.100 

-9.6 40.000 

-26.2 39,300 

•17.1 109,200 

-10.6 54.100 

-15.7 41,800 

-13.6 40,800 

•10.8 36.900 

-44 42.800 

-6.3 39.400 

-12.4 44.200 

-4.8 45.000 

-17.4 41,900 

-19.5 24,900 

-13.5 ££,000 

-0.4 36.900 

-21.1 56.300 

-7.3 55,300 



591 65 1546 <-35.0 11.500 

592 1014 614 -11.7 48,400 

593 732 176 -18.1 172.300 

594 1627 476 -3.0 59.000 

595 1009 1426 .11.6 15.500 



596 619 266 -21.9 100 500 

567 1176 461 -6.1 60 700 

596 1465 1044 .5.0 26*800 

566 741 1168 -17.9 23.600 

600 907 4C2 -14.0 66 000 

601 667 656 -19.5 45,800 

602 712 1136 -18.7 25 400 
803 896 161 -14.1 165*200 
604 763 1461 -16.7 14,400 
8°5 736 223 .ie.o 125*300 

606 £ 2S 273 -2i.6 96,700 

607 1064 2B6 -10 6 94 000 
806 663 503 .,4.5 56.700 

609 2012 610 >0.0 46 700 

610 1255 903 -*.1 34*200 

612 1103 391 -10.1 69 600 

613 776 265 -16.9 1C2 000 

614 *£24 5 16 . 157 U4O0 

615 1055 195 -10.3 149 100 

616 1759 476 -1.6 59*000 

617 994 372 -12.1 72.900 
616 751 374 -17.6 72 400 

619 U29 516 -5.7 55300 

620 1050 £20 -1 1 .1 ££'200 

621 923 1105 -13.7 26*600 

622 1462 622 -5.1 47 900 

623 759 225 -17.4 1 24 000 

624 756 1036 -17.4 29*000 
€25 1436 606 -5.5 48*900 
£26 1 096 1 069 -10.2 27 200 

627 942 548 -13.3 53.000 

628 605 621 -16.0 48 000 
€29 £99 979 -14.1 31 300 

630 1135 1321 -9.6 19100 

631 579 615 -12.5 48300 

632 1542 1076 -4.1 27 *G00 

633 1 345 814 -6.g MOOQ 

634 409 950 -32.2 32 400 

635 1165 704 -9.2 43*300 

636 774 604 -17.0 49 000 

637 1263 £24 -8.0 54*800 

638 552 411 -13.1 66 700 

639 1717 575 -2.1 si'ooo 

640 994 292 -12.1 52 000 

641 165 1224 <-35.0 22 400 

642 803 251 -16.2 106*900 

643 719 296 -16.5 90 700 

644 1100 294 .10.2 91400 

645 534 1263 -26.1 21 000 

646 1153 1038 -9.4 25 000 

648 1246 204 ^.2 140*000 

649 14 1 406 <-35.0 16 200 

650 1713 1 049 -2.1 26 600 

651 1986 1183 >0.0 23 800 

652 1378 616 -6.5 38000 

653 1442 1165 -5.5 2*400 

654 650 806 -20.8 36 400 

655 1111 551 -10.0 52700 

656 10S5 861 -10.3 36 000 

657 1 524 540 .4.4 53 600 
656 1777 660 -1.4 36 000 

659 391 584 -23.4 50*400 

660 677 565 -i 2 .5 51700 

661 656 166 -20.5 167,500 

662 732 312 -16.1 86 100 

663 1 767 567 .,.2 SlisoO 

664 888 266 -14.4 100 900 

665 889 775 -14.3 35 800 

666 715 221 . 1B .6 126 300 

667 781 227 -16.8 122^400 

668 646 165 -21.0 185 100 

669 1116 353 -9.6 76 300 

670 1382 643 -6.4 46*600 

671 547 789 -25.3 35*200 
673 984 746 -12.4 41.200 



X Y CPKrt SDSMW 



674 1661 448 

675 1523 562 

676 706 642 
877 916 615 

678 1065 551 

679 600 923 

680 1237 1004 
661 1103 283 
682 1406 477 
663 1596 249 

684 £55 609 

685 1167 1313 
886 1932 790 
867 1545 616 
668 1456 764 

689 1011 953 

690 1995 270 

691 812 888 

652 1154 1461 

653 1 993 819 

694 1626 656 

695 926 254 

696 1854 715 

697 1997 345 
696 957 563 
699 1 540 730 

702 577 900 

703 1610 562 

705 1 278 571 

706 1841 704 

707 1018 1386 

709 1074 1145 

710 293 889 

712 720 412 

713 1386 841 

714 1328 263 

715 696 433 

716 701 481 

717 1875 699 

718 575 702 

719 1216 204 

721 1069 464 

722 1272 506 

723 958 822 

724 763 395 

725 720 916 

726 1476 415 

727 1 646 4 73 

728 510 763 

729 1217 1126 

730 1858 724 

731 665 765 

733 1321 312 

734 719 427 

735 1101 473 

736 1 359 569 

738 696 220 

739 687 409 

740 1205 256 

741 995 563 
"42 89e 596 

743 681 181 

744 1951 686 

745 726 168 

746 999 643 

748 182 1503 < 

749 2005 649 

750 1448 £75 

751 792 266 

752 469 296 

754 664 254 

755 1195 184 

756 1821 1113 

757 909 246 
760 790 133 



. -2.7 62.100 

-4.4 51.900 

-18.8 46,700 

-13.7 48.300 

•10.5 52.700 

•22.7 33,400 

-8.3 30.300 

-10.1 95.100 

-6.1 59,100 

-3.4 109.800 

-24.8 43.500 

•9.2 15.300 

0.0 39,100 

-4.1 48,100 

•5.2 40,300 

•11-8 32.300 

>0.0 100.200 

-16.0 34,900 

-9.4 14,400 

>0.0 37.800 

-3.0 45.900 

-13.6 107,000 

-0.6 42.700 

>0.0 78.000 

•13.0 51.800 

-4.2 42.000 

-23.8 34,400 

-3.2 51,900 

•7.8 51.200 

-0 7 43.300 

-11.7 16.900 

•10.7 25,100 

<-35.0 34,800 

-16.5 66.600 

-6.4 36.800 

-7.1 103,100 

-19.1 63.900 

-19.0 58.700 

-0.5 43.600 

•23.9 43,400 

-6.6 140.400 

•10.8 60.400 

•7.9 56.400 

-13.0 37,700 

-17.3 69.100 

-18.5 33.700 

-4.9 66,200 

-0-7 59.400 

-27.3 39.400 

-6.6 25,800 

-0 6 42.300 

-20.2 40,300 

-7.2 85,900 

■18.5 64,600 

-10.2 59.500 

-6.7 51.400 

-192 127,600 

•19.5 67.000 

-8.7 106.200 

•12.1 51.900 

-14.1 49,500 

-14.5 165,900 

>0.0 44.200 

-18.3 183.600 

-12.0 46.600 

•-35.0 13.000 

>0.0 46.300 

-5.4 51,000 

•16.5 101,900 

•2e.9 90,600 

-20.3 107,000 

-6.8 161.000 

-0.9 26.300 

•13.9 111.000 

•16.5 264.900 
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Utarophomu 1991. J 2, 907-9)0 



MSN 



V CFKoi SDSMW 



MSN 



V CFKDl SDSMW 



761 
763 
764 

765 

766 

767 

766 

769 

770 

771 

773 

775 

776 

777 

778 

779 

780 

764 

765 

766 

767 

79C 

791 

792 

753 

794 

796 

797 

796 

799 

600 

601 

602 

803 

B04 

805 

806 

807 

808 

809 

810 

811 

ei2 
ei3 

614 

615 

616 

817 

618 

619 

£20 

621 

622 

823 

824 

625 

626 

627 

628 

830 

631 

832 

633 
834 
837 
836 
839 
640 
841 
842 
643 



1399 
1416 

2020 
651 
1052 
1968 
1330 
1970 
657 
1337 
1576 
969 

1438 

1539 
850 
700 

1052 

1413 

1364 

1822 
893 
616 
451 
777 

1536 

1461 
388 

1126 

933 

1420 

1759 

624 

898 
1775 

573 

203 

960 

902 

625 

iesi 



845 
846 
847 



1356 
651 
745 
2026 
1086 
629 
1376 
1771 
1045 
964 
1712 
1256 
1517 
1442 
1240 
1309 
2012 
937 
1342 
562 
1073 
481 
501 
751 
635 
1494 
1952 
1585 
571 
1325 
1727 
630 
2016 
673 



733 
1085 
569 
475 
1149 
468 
665 
613 
617 
974 
502 
624 
706 
456 
434 
411 
1136 
529 
885 
835 
392 
682 
1429 
377 
1543 
807 
546 
212 
437 
593 
279 
865 
547 
1468 
196 
494 
1039 
306 
627 
1015 
573 
249 
393 
1246 
610 
645 
313 
1177 
790 
263 
362 
279 
205 
654 
449 
513 
1014 
708 
1405 
756 
626 
1039 
820 
561 
748 
833 
459 
301 
1060 
1312 
649 
301 
679 
905 
1200 



4.2 
-5.9 
>0.0 
-20.6 
-11.1 
>0.0 
-7.1 
>0.0 
-15.0 
-7.0 
-3.7 
-12.8 
-5.5 
-4.2 
•15.1 
-19.1 
-11.1 
-6.0 
-6.7 
-0.9 
-14.3 
-22.0 
•29.8 
•16.9 
-4.2 
-5.1 
-33.6 
-9.8 
-13.5 
•5.9 
-1.6 
-21 .7 
-14.2 
-1.4 
-24.0 
<-35.0 
-12.5 
-14.1 
-21.7 
-0.7 
•30.9 
-6.8 
-15.1 
-17.8 
>0.0 
-10.4 
-21.6 
-6.5 
•1.4 
-11.2 
•12.4 
-2.2 
•8.1 
-4.4 
-5.5 
-6.3 
-7.4 
>0.0 
-13.4 
-7.0 
-24.5 
-10.7 
-26.5 
-27.8 
-17.6 
-21 .3 
•4.7 
>0.0 
•3.6 
•24.1 
-7.2 
-2.0 
•21 .5 
>0.0 
•19.9 



41.800 
27.300 
51,400 
56 .300 
25.000 
5S.900 
44.300 
4£,500 
46.200 
31.500 
56.700 
37,600 
43.100 
61.000 
63.800 
66.800 
25.500 
54.400 
35.000 
37.100 
69.500 
35,100 
1E.400 
72,000 
11.700 
36,300 
53,100 
133.700 
63.400 
49,600 
96.500 
35,800 
53,000 
U.200 
146.400 
57.400 
25,000 
C7.200 
37.500 
29.900 
51.100 
105,700 
69,400 
21.600 
36.200 
46,500 
65.700 
24.000 
39.100 
103.100 
74.600 
96.700 
139.200 
46.000 
62.000 
55.800 
29,900 
43.100 
16,200 
40.700 
37,500 
29.000 
37,800 
50.500 
41.100 
37.200 
60.900 
89.300 
27.500 
19.400 
46.300 
89.200 
44.600 
34.200 
23,200 



648 
649 

650 
651 
652 
655 
656 
657 
656 
859 
860 
861 
862 
664 
865 
866 
666 
869 
670 
671 
E72 
£73 
£74 
675 
£76 
£77 
£76 
679 
660 
661 
863 
664 
665 
686 
667 



889 

890 

e9l 

892 

694 

895 

896 

697 

698 

899 

900 

901 

903 

904 

905 

907 

906 

910 

911 

913 

914 

916 

917 

919 

920 

921 

923 

924 

925 

S26 

927 

526 

529 

931 

932 

933 

934 

936 

937 



1863 
1166 
1535 
103 5 
634 
499 
1063 
667 
1448 
706 
1C70 
472 
674 
1307 
645 
627 
665 
1807 
1323 
1228 
1904 
556 
1540 
1566 
1198 
1C76 
1161 
647 
1756 
1543 
1432 
922 
1103 
1501 
796 
636 
551 
717 
1123 
891 
1245 
1962 
1322 
420 
662 
845 
624 
931 
799 
765 
775 
688 
£28 
681 
1544 
1606 
1237 
1442 
1260 
764 
1133 
1123 
629 
1131 
1441 
679 
1487 
1062 
1231 
1609 
610 
965 
947 
665 
1421 



271 
523 
1024 
626 
542 
220 
194 

eso 

639 

311 
1066 
347 
480 
499 
667 
10O4 
494 
402 
763 
1031 
346 
647 
756 
777 
351 
720 
1111 
757 
594 
276 
e90 
689 

414 

607 
1103 
634 
759 
54fi 
229 
413 
234 
346 
£26 
570 
428 
243 
703 
1094 
229 
520 
689 
£24 
1303 

1544 

301 
387 
688 
749 
367 
1541 
1123 
380 
242 
318 
874 
219 
1191 
775 

ei6 

670 
900 
520 
462 
843 
1056 



-0.6 
-5.2 
<2 

•11.4 

•U.5 
-27.8 
•10.9 

•14.4 
•54 

-16.9 
-10.7 
•26.6 
-1S.5 
-7.4 
-21.0 
-15.6 
•19.5 
-1.0 
-7.2 
-6.4 
-C.3 
•24.6 
-4.2 ' 
-3.fi 

-e.e 

-10.6 

-9.3 
-20.9 

-1 .6 

-4.1 

•5.7 
-13.7 
-10.1 

-16.3 
-21.3 
•13.1 
-16.6 

-5.6 
-14.3 

-6.2 

>0.0 

-7.2 
-31.4 
-20.3 
-15.3 
-21.7 
-13.5 
-16.3 
•17.2 
-17.0 

•14 4 

-15.6 
-197 

-4.1 

•3.3 

-e.3 

-5.5 

-6.0 
-17.3 

-9.7 

-9.8 
•15.6 

-9.7 

•5.5 
-19.7 

-4.8 
-10.5 

-8.4 

-3.3 
•16.0 
•12.6 
•13.2 
-14.8 

•5.9 



99.500 
54.900 
25,600 
37,500 
53.400 
127,100 
150.500 
34.800 
46.900 
86.200 
26,000 
77.600 
56.600 
57,000 
34.900 
30.300 
57,400 
66.000 
39.400 
29.300 
77,700 
46.400 
40.700 
39.700 
76,600 
42.500 
26.400 
40,700 
49,700 
57,100 
34,800 

44,100 

66.400 
46.900 
26.600 
47.200 
40.600 
52.900 

121.200 
66.400 

117,800 
77,700 
47700 
51.300 
64,500 

113.000 
43.400 
27.000 

121.000 
55.200 
34.800 
37.600 
19.700 
11,700 
89.100 
70.400 
44.100 

41.100 
73.700 
11.700 

25.900 

71.500 
113.200 

84,300 

35.400 
126.200 

23.500 

39.800 

38.000 

45,100 

34.400 

55.100 

60.600 

36.800 

28.400 



MSN 


X 


Y 


CPKof 


twv 


1197 


627 


-8.6 


w+ 1 


1765 


885 


-1.5 


»4< 


602 


472 


-22.7 


»43 


312 


496 


<-35.0 


944 


9S3 


491 


-12.1 


945 


1300 


269 


-7.5 


946 


630 


423 


-21.6 


947 


187 


736 


<-35.0 


948 


1380 


344 


-65 


949 


1766 


665 


-1.5 


950 


1038 


193 


-11.3 


951 


860 


152 


-14.9 


9S2 


957 


701 


-13.0 


954 


503 


547 


-27.6 


955 


1938 


712 


>0.0 


957 


1010 


816 


-11.8 


959 


768 


174 


-17.2 


960 


596 


419 


-23.0 


961 


557 


409 


-24.6 


962 


867 


320 


-14.4 


963 


564 


334 


-24.5 


964 


969 


1155 


•12.8 


965 


671 


255 


-20.0 


966 


1204 


798 


-8.7 


967 


910 


154 


•13.9 


968 


609 


1048 


•22.3 


969 


1285 


206 


-77 


570 


622 


232 


-15.6 


971 


976 


437 


-12.6 


972 


403 


567 


-32.6 


974 


279 


495 


<-35.0 


975 


844 


981 


-15.3 


576 


1124 


295 


-9.8 


577 


994 


664 


-12.1 


576 


1612 


642 


•3.2 


579 


749 


1141 


-177 


960 


1064 


642 


-10.8 


961 


1197 


911 


•8.6 


963 


1762 


1508 


-1.6 


964 


1344 


317 


-6.9 


965 


1024 


1105 


-11.5 


967 


739 


1159 


-17.9 


988 


816 


555 


•15.9 


990 


785 


361 


-16.7 


991 


1159 


317 


-9.3 


992 


1090 


928 


-10.4 


993 


1030 


701 


•11.5 


994 


847 


811 


-15.2 


995 


902 


461 


■ -14.1 


996 


888 


647 


-14.4 


997 


1815 


579 


-0.9 


998 


1205 


504 


-8.7 




617 


289 


-22.0 


1000 


968 


290 


-12.8 


1001 


S70 


771 


-12.7 


1002 


1736 


478 


-1.9 


1003 


643 


1 184 


■21.1 


1006 


822 


487 


-is.e 


1007 


875 


279 


•14.6 


1009 


291 


644 


<-35.0 


1010 


1386 


745 


-6.4 


1011 


459 


541 


-29.4 


1012 


679 


661 


•197 


1013 


1818 


1128 


-0.9 


1014 


1032 


634 


•11.4 


1015 


1629 


994 


-3.0 


1016 


1311 


1134 


-7.4 


1017 


1722 


424 


•2.0 


101B 


1015 


743 


-11.7 


1020 


1574 


1219 


-3.7 


1021 


781 


484 


-16.6 


1022 


1129 


83 


-97 


1023 


812 


317 


•15.9 


1024 


785 


446 


•167 


1025 


1290 


739 


•77 



SDSMW 



37.500 
35.000 
56.600 
57,100 
57.700 
100.300 
65.100 
41.600 
76.200 
415.400 
151.000 
213.000 
43.400 
53.000 
42.900 
37.900 
174.900 
65.700 
67,100 
63,900 
80.500 
24.800 
106.600 
38.700 
210.300 
2e,700 
138.900 
119.300 
63.400 
51.600 
57,400 
31.200 
91.100 
45,400 
46,700 
25.300 
46.700 
33.900 
12.800 
84,700 
26.600 
24.600 
52.400 
74.900 
84.500 
33.300 
43,400 
38.200 
60.700 
36,600 
50,700 
56.500 
93.100 
92.700 
40.000 
58.900 
23.700 
56.100 
96.400 
46.600 
41.200 
53.500 
45.600 
25.600 
47,200 
30.700 
25.500 
65.000 
41.300 
22.500 
58.400 
591.300 
84.600 
62.400 
41.500 
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MSN 



1026 
1C27 
1C28 
1030 
1C31 
1032 
1C33 
1034 
1035 
1036 
1036 
1040 
10*1 
10*4 
1045 
1047 
104£ 

104S 

1050 
1061 
10*2 
1053 
1064 
1065 
1056 
1056 
1060 

ioei 

1062 

1064 

1065 

1066 

1067 

1066 

1069 

1C71 

1073 

1075 

1076 

1078 

1061 

1063 

1065 

1090 

1092 

1093 

1094 

1095 

1096 

1099 

1101 
1102 
1103 
1105 
1106 
1107 
1106 
1111 
1112 
1115 
1116 
1117 
1116 
1119 
1120 
1121 
1122 
1123 
1125 
1126 
1126 
1133 
1139 
1147 
1148 



405 
1298 
656 
1264 
666 
1547 
1361 
1525 
1128 
1226 
1761 
541 
616 
1036 
1439 
1540 
1576 
1069 
949 
426 
1563 
779 
1613 
1380 
264 
1261 
3S3 
1617 
1245 
1256 
705 
1161 
529 
506 
1896 
673 
1766 
636 
1863 
£26 
671 
1697 
1157 
620 
1667 
2019 
1546 
1545 
61 
1954 
566 
1050 
457 
1664 
1714 
1717 
1976 
547 
1348 
1385 
1076 
975 
1202 
1022 
1905 
1512 
1114 
1464 
1048 
1122 
1722 
1098 
1630 
764 
1968 



552 
646 

547 
- 226 
622 
403 
551 
496 
645 
274 
262 
639 
910 
465 
407 
250 
635 
411 
1040 
618 
1365 
1052 
620 
377 
663 
746 
605 
645 
746 
792 
634 
734 
656 
696 
604 
609 
1128 
773 
661 
566 
483 
202 
794 
910 
597 
694 
538 
477 
935 
237 
1046 
667 
797 
532 
649 
546 
722 
1066 
621 
762 
816 
787 
933 
1076 
616 
1301 
677 
452 
657 
802 
692 
625 
569 
1162 
724 



-3215 
-7.5 
-15.0 
-7.7 
-12.3 
-4.1 
-6.4 
-4.3 
-9.7 
4.5 
-1.6 
-25.7 
-15.8 
-11.3 
-5.5 
-4.2 
-3.7 
-10.4 
-13.2 
-31.1 
-3.6 
-16.8 
-3.2 
4.5 
<-35.0 
-6.0 
-33.3 
-0.9 

-e.2 
4.1 

-18.9 
-9.0 
-26.3 
-27.4 
-0.3 
-14.7 
-1.5 
-15.4 
-0.6 
-15.7 
-12.7 
-2.3 
-9.4 
-21.9 
-0.5 
>0.0 
-4.1 
-4.1 
<-35.0 
>0.0 
-23.3 
-11.1 
29.5 
-0.4 
-2.1 
-Z1 
>0.0 
-25.3 
-6.9 
-6.4 
-10.6 
-12.6 
-8.7 
-11.6 
-0.3 
-4.5 
-9.9 
-5.1 
-11.1 
-9.8 
-2.1 
-10.2 
-0.8 
•17 J 
>0.0 



V CPK* SOSMW 



52.600 
36.500 
£3,000 
123,200 
37.700 
67.900 
12.700 
£7.200 
46.500 
66.300 
1C3.600 
36.900 
34,000 
56.300 
67,300 
1 0S .200 
47.100 
66,700 
26,900 
37,600 
16.900 
27,000 
46.000 
72.000 
46.500 
41,200 
46.000 
46,600 
41.200 
39,000 
33.000 
41.600 
45,800 
43.700 
49,100 
46.700 
25.800 
35,900 
36.000 
£1,600 
56,500 
142,300 
36.900 
34,000 
46,500 
34,600 
53.700 
59,100 
33,000 
116,000 
28.600 
45.200 
38.800 
54.200 
46.300 
53.100 
42.400 
28.000 
48.000 
40,400 
36.000 
39.300 
33.100 
27.600 
48.300 
19.700 
44.700 
61.700 
36.200 
38,600 
34,700 
37.500 
51.400 
23.600 
42.300 



1153 
1154 
1161 
1162 
1163 
1166 
1170 
1171 
1172 
1174 
1176 
1177 
1176 
1179 
1180 
1161 
1182 
1163 
1164 
1165 
1166 
1186 
1190 
1191 
11S2 
1193 
1194 
1165 
1196 
1167 
1196 
1199 
1200 
1201 
1202 
1203 
1204 
1205 
1206 
1206 
1210 
1211 
1212 
1214 
1215 
1216 
1217 
1218 
1219 
1220 
1221 
1222 
1223 
1224 
1225 
1226 
1227 
1226 
1229 
1230 
1231 
1232 
1233 
1234 
1235 
1236 
1237 
1238 
1239 
1240 
1241 
1242 
1243 
1244 
1245 



621 
1594 
637 
623 
665 
564 
££2 
536 
545 
1095 
1304 
1366 
1606 
1465 
1459 
1431 
1407 
1363 
1454 
1422 
1394 
1171 
1457 
666 
265 
403 
344 
505 
£72 
636 
637 
614 
637 
10S5 
1719 
791 
964 
313 
306 
320 
326 
394 
402 
366 
641 
660 
914 
673 
970 
1021 
1392 
1354 
1362 
673 
614 
603 
696 
707 
475 
466 
759 
1324 
1563 
1865 
1812 
1411 
1392 
794 
769 
740 
743 
713 
662 
663 
565 



1158 
664 
400 

367 
367 
£26 
£29 
£24 
£14 
£22 
566 
£39 
7C2 
224 
224 
223 
223 
224 
162 
163 
162 
214 
286 
1114 
693 
12S2 
1775 
1311 
1253 
1502 
1402 
1407 

1431 

1394 
1545 
668 
1021 
195 
194 
157 
167 
294 
294 
294 
329 
329 
266 
245 
372 
296 
205 
203 
205 
540 
542 
539 
623 
626 
447 
1282 
1461 
1170 
1005 
609 
617 
703 
662 

410 

407 
406 
511 
510 
509 
504 
582 



■13.7 
-3.5 
21.3 
-21.8 
•20.2 

24.4 

-25.0 
-25.9 
-25.5 
-10.2 
-7.5 
-6.6 
•3.3 
-4.6 
5.2 
-5.7 
-6.1 
-6.4 
-5.3 
£.8 
-6.3 
-6.2 
-S.2 
-16.5 
«-35.0 
-32.6 
O5.0 
-27.6 
-24.1 
-21.2 
-21.3 
-22.1 
-21.3 
-10.3 
-2.1 
-16.5 
-12.9 
<-35.0 
<-25.0 
<-35.0 
-35.0 
332 
-32.7 
-337 
-21.2 
-20.4 
-13.8 
-14 7 
-127 
-11.6 
-6.3 
-68 
-67 
-19.9 
-22.1 
-22.6 
-19.2 
-16.9 
-2B7 
-29.0 
-17.4 
-7.2 
•3.6 
-0.6 
•1.0 
-6.0 
-6.3 
-16.4 
-17.1 
•17.9 
-17.8 
-187 
-19.6 
-20.3 
•24.4 



24.700 
35.900 
66.400 
66,800 
66,700 
54.500 
54.500 
54.800 
55.700 
££.000 
5C.200 
£3.700 
43,400 
124,900 
124,900 
12£.10O 
125.200 
124,700 
164.400 
162.600 
164.300 
131.600 
94,200 
26.200 
34700 
20,000 
20.600 
16.400 
20.000 
13.000 
16,300 
16.200 
15.400 
16.600 
11.600 
45.200 
25,700 
1*6.700 
146,800 
147,400 
1*6.600 
91.400 
91,200 
91.400 
81,600 
81,600 
101.600 
112.000 
72,900 
90,100 
135.500 
141.800 
13S.5O0 
53.600 
53 400 
53.600 
47.800 
47,500 
62.300 
20.400 

14.400 

54.200 
3C.300 
36.200 
37.900 
43.400 

44,500 

66,900 
67.300 
67.500 
55.900 
56,000 
56,100 
56.500 
50.500 



547 

530 
516 
973 
607 
665 



1246 
1247 
1249 
1250 
1251 
1252 
1253 

1254 1311 

1255 1300 
1257 1936 

1256 1806 

1259 1727 

1260 1629 

1261 1555 

1262 1466 

1263 1413 

1264 1340 

1265 1263 

1266 1162 

1267 mo 

1268 10S5 

1269 999 



577. -2SJ 
576 -26.3 



1270 

1271 

1272 

1273 

1274 

1277 

1278 

1279 

1280 

1261 

1282 

1283 

1264 

1285 

1286 

1287 

1288 

1289 

1290 

1291 

1292 

1293 

1294 

1295 



959 

905 

e57 

610 

774 

737 

702 

671 

645 

617 

595 

573 

552 

536 

515 

496 

467 

447 

427 

412 

397 

381 

365 

348 



572 

536 

532 

529 

766 

746 

761 

712 

718 

715 

713 

717 

717 

722 

717 

717 

720 

717 

717 

717 

715 

712 

714 

705 

711 

708 

711 

710 

710 

707 

704 

700 

695 

694 

687 

663 

669 

667 

655 

655 

652 

654 

653 

653 



-27.0 
•12.7 
-22.4 
-20.2 
-14.1 
-7.4 
-7.5 
0.0 
-1.0 
-2.0 
-3.0 
-4.0 
•5.0 
-6.0 
-7.0 
-8.0 
-9.0 
-10.0 
-11.0 
•12.0 
•13.0 
-14.0 
•15.0 
•16.0 
•17.0 
-18.0 
-19.0 
•20.0 
•21.0 
-22.0 
-23.0 
•24.0 
-25.0 
-26.0 
•27.0 
•28.0 
•29.0 
•30.9 
-31.0 
-32.0 
-33.0 
-34.0 
-35.0 
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Tzble 4. Compuied p/*s of some known proteins relsitd ic measured 



Prctein Name 

0 Creatine phospho kinase (CPK), rabbit muscle 

1 Fatty acid-binding proiein, rat hepatic 

2 t2-microclcbulin, human 

3 Cartamoyl-phcsphate synthase, rat 

4 Prealbumin ( serum albumin precursor), rat 

5 Serum albumin, rat 

6 Supercxid dismuiase (Cu-Zn, SOD), rat 

7 Phcspholipase C, phephoincsitide-spedfic (?), rat 

8 Albumin, human 

9 Apo A-l lipoprotein, rat 

10 proApo A-l lipoprotein, human 

1 1 NADPH cytochrome P-450 reductase, rat 

12 fietinol binding protein, human 

13 Actin beta, rat 

14 Anin gamma, rat 

15 Ape A-l lipoprotein, human 

16 Apo A-IV lipoprotein, human 

17 Tubulin alpha, rat 

16 F 1 ATPase beta, bovine 

19 Tubulin bete, pig 

20 Protein disulphide isomerase (PDI), rat hepatic 

21 Cytochrome b5, rat 

22 Apo C-ll lipoprotein, human 
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radioimmunoassay (RIA and immunoradiometric assav (IRMA)) JTT 
ana,yte concentrations above ca ,o> ^ecu^Zs^ 

case of competmve or 'limited reagent' assays, from the manipulation mi, I££ in £ 
system combined with the physicochemica, characteristics of tEe partfcu Ir ant body used 
however. ,n the case of 'non-competitive' systems, the specific activity of the laoe. mal pfav a 
more important constraining role. It is theoretical.y demonstrable tnat the deve.opment o, 
assay techn.ques yielding detection .imits significantly lower than 10' mo.ecu.es/m^epends 
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invoked 16 d,SCT ' mina,io " b ^een the products of the immunological reactions 

Chemiluminescent and fluorescent substances are capable of yielding higher specific activities 
than commonly used radioisotopes when used as direct reagent labels in this context IZ 

me^hSZoies V" ** ° f "ve". non-compe^ mw iZ^uno^t 
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generation of ambient analyte' microspot immunoassays permitting the simu "tsneZ 
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(for example, laser scanning techniques. Early experience!* ^Z^^^Si 
sensrt,v,t,es surpassing that of isotopically based methodologies can readily be developed 

Sc^copy U,traS6nsitive '"""urioatMy; fluorescent microspot immunoassay; confocal 
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- INTRODUCTION 

Immunoassay methods relying on radioisotopic 
IcU I. have played a major role in medicine and 
other bioJogicaJJy related fields (agriculture 
veterinary science, the food and pharmaceutiSi 
industries, etc.) during the past two decade 
Jneir importance has derived from the exDloita" 
t.on both of the 'structural specificity' cSarac e z 
ing antibody-antigen reactions and the 'detecta- 
b.hty of isotopically-labelled reagents, the latter 
permitting observation of the binding reactions 
between exceedingly small concentrations of the 

?e e aiu r r e e? a T combi ™"°n of the e 

m!SZ5 u end ° Wed "^immunoassay 
methods with unique specificity and sensitivity 
characteristics, and accounts for their ubiquitous 
use throughout modern medicine and biology 
However, in the past few years, interest has 
increasingly focused on so-called 'alterative' 
non-radioisotopic, immunoassay methods- <uch 
techniques are based on essentially identical 
analyncal principles but differ in the markers u ed 
to labe the particular immunoreactant (an body 
or analyte) whose distribution between bound 



R. EKINS, F. CHU AND J. MICALLEF 

fundamental reason for theTr replacement stem, 
paradoxically, from the current^e^ement , ' 
develop microanalyses technique? whTch a e 
supenor to them in this pabular * 
Rad.o.sotopic methods are, in practice, limitSTo 

bo U Tlo" r , e n" em ,° f a f alyte conce "^ations above 
about 10 -]0 y molecules/ml (i.e aDDroxf) is_t « 

r"iH ,/, ^ DalCUbu ei al " I984 )- However in certain 
fields (e.g. virology, tumour detection) Lre is a 
particular need to detect or measure molecuL? 

nood^d neri, brief discuss! in JhT pr« enl 



The concept of sensitivity 



(1) Environmental; logistic; economic; practical- 
. ny and convenience, etc. (i.e. 'non-scien, fie 
(2 The attamment of higher sensitivity. h 

} 5l devel °P m ! nt 0f ' imi "unosensors' and 
immunoprobes . 

^ ^« ,el ° Pmt "' °' ' mul "- a "=')^' assay 
development strategy in these areas. mun ° assa y 



THE ATTAINMENT OF 'ULTRA-HIGH' 
IMMUNOASSAY SENSITIVITY 

Though, as indicated above, the sensitivity of 
radioisotop.cally based immunoassay Sodl 
has constituted one of the principal four?d a ,ion 
of their w,despread use over the past 25 years a 



One major source of past confusion has been 
disagreement regarding the concept of "SnStiv 

WS55SB 

see also Ekins ero/., 1970b Tait 107m u- 
widely agreed that the n'ofion' %2\ U »™ 

e^roVeouf "AV- rVe ,l"? P,ieS gFeater ^siSSy^ 

revMUH k J ; nval,d »y of this belief is clearly 
reveal ed by lhe f ihat |he 

its lower hmit of detection (Fig" 1(b)) and ,^ 
concept is now embodied in a * it n "LX 
agreed definitions of the term. An essentially 
identical definition is as the oreeWnn ? y 
standard deviation) of measurement ° ro dose" 
riT th "*™*y determines the leas? Juan, £ 
distinguishable from zero and hence the ass™ 
detection l.mit. The sensitivity of an assay -is thus 
represented by the zero-dose intercept' of fhe 
P r D e r C ' S ' 0 " P rofi,e ' (Rg- 2(a)) when J lat te is 
expressed in terms of standard deviation rather 
: t t*"' of variation (Eki^SSi)^ 

viHrfi'n 6 SenS,UVe 0f ,wo assavs * the one 

yielding greater precision of the zero dose 
estimate (Fig. 2(b)). ose 
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F/E plot 



B/F plot 




Any pi t 



R 3 ; 




a. 



b. 



Figure 1. (a) Diagrammatic representation of conventional RIA dose-response curves for *v«:t em « ...i™ k;„h ^ . 
antibody concentrations plotted in terms of free-bound (F/E) and bound/frE !(I/F Sed aSn 

amount of antibody yields a dose-response curve of arestPr dnnc in the cm « ^ J , / , en " ™° ie ina 1 ine use of a low r 
impossible to decide on ,he basis of th'e data ^fffSu^Jo^^ £££ L^sstv svSem £ 
higher sensitivity, (b) The sensitivity of an assay is e^entigllv rerjresentPd hv .h« ^nlnLT. u ? the assay system of 
dose measurement (SD loosel , at zero dose. This ,s ** S ° ° f *• 

curve slope at zero dose (i.e. ((SD„ e5Ponse) ) x dD/dtf) 0 ). Th,s guant.ty ,s una f ected by fhe'Th^ 

unnecessary step merely adds to confusfonXTth^ «* 



'Competitive' and 'non-competitive' ('limited 
reagent' and 'excess reagent') assays 

A second important misconception in this area is 
the notion that immunoassays relying on the use 
of labelled antibodies (e.g. immunoradiometric 
assays, IRMA) are ipso facto more sensitive than 



those which rely on the use of labelled 'analyte' 
(e.g. radioimmunoassays, RIA); furthermore the 
grounds originally advanced for the claimed 
superiority of labelled antibody methods (Miles 
and Hales, 1968) were partially based on false 
concepts of sensitivity, and thus failed to identify 
the true reasons why certain assay designs are 
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100 
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Figure 2. (a) The precision profile of an assay portrays the error in the dose measurement as a function of dose The error 
may be represented, mteraha. by the absolute error (AD; e.g. SD of D) or the relative error (AD/O- e g CV of D) tS 
error .n the measurement of zero dose, represents the sensitivity of the assay. The w rking range may be defined is the «rS 
of dose values w.th.n which AD/D .s less than an 'acceptable" value set by the investigator, (b) The more sensrtivl ofTh^ 

S7JSK 0 w pts at a ,ower value However> assay " is more precise 3 ighef V8 ' u 
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hv te £ ,y C fK Pab,e l>' ie,din ? <™ higher sensitiv- 
2riSo„ 0,herS ' 711,5 ,SSUC ***** me ™ 
. The purely pragmatic sub-classification of 

analyte methods diverts attention from a more 
fundamental divide in immunoassay methodolo- 
gy which relates to the optimal concentration of 
antibody required in an assay system to maximize 

be termed Ami/ed re^eW or 'competitive') the 
optimal concentration tends to zero; conversely in 
others (which may be termed >exce SS rector 
non-competm^) the concentration tends Z 
infinity „ should be particularly emphasize?*.? 
the optimal antibody concentration is essentially 
governed not only by the physicod^kS^rS 
h^T ? ^ am,bod ^ a naJyte binding reaction 
but also by the errors incurred in measurement of 
he assay response. Were an assay system to be 
totally error-free, no antibody concentration^ 
would be optimal, and the distinction between 
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distinction between these two form, «f • 
inunoassay simply reflects difWe^n SL 

by e S , ima „™ ?m£Z?£L sm °" 

measurement nf • J 3 * 'cues on tne 

Measurement of occupied »ite t 



IS 

< 
< 

k 

Two-site iabeuec antibody assay 



chemical theory underlying this fundan^ema 

' i r i68 nc %7orr!; oassay des * n ^ e ™ 

fl/., jy68, 1970a; Jackson etai., 1983), the reason 
or it can perhaps be more readily understoodTf 
the basic principles of immunoassay are ponraved 
n a somewhat different way from *aHn S 

e y n, a a7 lv Tnl y H PreSemed - A " f — "ass y 
W?£ . P d UP °" meas "rement of the 
fractiona occupancy' by analyte of ant bodv 
binding sites following reaction of analyt 2 

a mnt° dy (SCe , Fig - 3(3)) - 71,056 ««*niqu« wS 
untculj I H° n measureme "t of residual 
""occupied, binding sites optimally necessitate 
the use of concentrations of antibody tenS to 
zero and may be termed 'competitive^ convefsi- 
ly those in which occupied sites are d"rectfv 
measured necessitate use of hieh a „ L y 
centrations and are termed ' y C ° n " 

(Fig- 3(b)). ™ s empl.Sr 1 ,!,.;^^ 
. i assay design characterizing so-called comp et 1 
live and non-competitive methods are essentiallv 
unrelated to which component (if any) of th^ 
reaction system is labelled Indeed L!" he 
■ n which no label of any ^W^oTJcaT^ 
dentical grounds, be subdivided into those of 
limited reagent' (or 'competitive') and 'excess 
eagent' (or 'non-competititve') design Thus the 
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0' Labelled antigen 

y- Labelled antibody 
L.b.il M an„., OloIy0lc antib00r ^ Ana( ^ 



the (labelled, antfbody are meTsS^ ' ° CCUP ' ed siles °< 
(below right) when unoc CU D?eV!Zl *** com Pemive" 
antigen (below let,) or ^";.T ,u *l«l*d 
methods (below centre) rely on " nlMdl0, W* antibody 
unoceop/eo" by ana.yte anl 3 re S!^S" e^n ? n, 0f si,es 
competitive" design. therefore invariably, of 
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'Competitive* immunoassay "Non-competitive* immunoassay 
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label 
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Fi S »'e 4. Curves show.ng the theoretically predicted relationship between antibody affinity and the sensitivities achievable 
us.ng competmve and •non-competuive' assay strategies. The -potential' sensitivity curves assume th use of i^fnftl sperific 
act.v,ty labels; the sens.tMt.es ach.evable using '=l-lsbelled ant.gen or ant.body are also shown Shaded " I TndlSe he 
sens.t.v,ty less due to errors .n measurement of the label. Curves relatino to competitive Savs assume a 1%J fro in 



Conversely, when occupied sites are measured 
directly, this particular constraint does not arise; 
indeed, considerable advantage often derives 
from using relatively large amounts of antibodv in 
the system. 



Sensitivity of 'competitive' and 
'non-competitive' immunoassays 

Competitive and non-competitive immunoassays 
differ significantly in many of their performance 
characteristics in consequence of the differences 
in optimal antibody concentration on which they 
rely. Most particularly they differ in their 
potential sensitivities. Figure 4. portrays the 
sensitivities predicted theoretically as a function 
Of antibody binding affinity, making realistic 
assumptions regarding the experimental errors 
incurred in reagent manipulation, 'non-specific' 
binding of labelled antibody, etc., and assuming 
the use of optimal reagent concentrations (Ekins, 
1985). Amongst other concepts illustrated in the 
figure is the much greater assay sensitivity 
potentially attainable (using an antibody of given 
affinity) by adoption of a non-competitive 
approach. In short, whereas the maximal sensitiv- 



ity realistically achievable using a competitive 
design is in the order of 10 7 molecules/ml (using 
antibodv of the highest affinity found in practice), 
a non-competitive method is capable of yielding 
sensitivities some orders of magnitude greater 
than this. However, Fig. 4 also demonstrates that, 
assuming the use of high affinity antibodies (i.e. 
M0 n -l0 12 l/M), maximal sensitivities yielded by 
isotopically based techniques (whether relying on 
labelled antibody (IRMA) or labelled analyte 
(R1A), or whether of competitive or non- 
competitive design) are closely comparable, i.e. 
of the order of 10 7 -10 8 molecules/ml. 

This limitation is a manifestation of the fact 
that, in the case of the non-competitive methods, 
an important constraint on assay sensitivity is 
(under certain circumstances) the 'specific activ- 
ity' of the label used. On the other hand, 
limitation of assay sensitivity due to the low 
specific activity of radioisotopic labels does not 
often arise, in practice, in the case of competitive 
assays, whose sensitivity is generally restricted by 
other factors (Ekins, 1985). The fundamental 
significance of this conclusion is that, only by the 
use of labels possessing specific activities higher 
than those of the commonly used radioisotopes in 
assays of noncompetitive design, can current 
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.1 _sensit;vjty bmits.be breached, Converse!? „«- n f 
a htgher specific activity | abe j h t coL 'L 
assay will usually have no sicnificanf effeTnn ? 

. sensitivity ( experiS m er € r ^ f ° J ! 
red ,n reagent manipulation of the magnitude 
generally encountered in practice) mcgn,lude 

High specific activity non-isotopic labels 

^niipH™ 'l pedfic activi, y is conventionally 
-pphed ,n the case of radioisotopic labels to 
denote the number of radioactive disin egra io^ 

fabX?™ Wei « ht of ^e i ofop °o 

abelled compound. In the present context use of 
the term is widened to signify 'detectable e?ems f 
per umt t.me per unit weight of labelled material 
Thus n can be used to indicate the rate of pto"o n 
em.ss.on by a chemiluminescent or fluorescent 
label, or the rate of conversion of substSte 
rSeSXte* an H en2 - Vm i k ,abeI -«o molecu of 
^ T S P roduct - ^e importance of the 
concept denves from the fact that 'Ln^ 
measurement error' (i.e error in th. gnal 
jnent of the label peri) is Str bo'^Sta 
1'mnmg assay sensitivity, and may-^ e n o°he? 
sensmvny-constraining fact0rs a > re 
become dominant. Furthermore when evtPnw 
the sensitivities of immunoassay svstems ^bevond 
the,r present limits, the numbers molecules 
involved are low, and statistical errors in^u red fn 
counnng mdividual 'detectable even s' and \t£ 

y.eldmg chemiluminescent signals or fluore S 
products can be utilized. ""orescent 
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_ ■ ■ 

Table 1. Relative specific »r+;~u- I ~ 
"topic and non^sctS^tbel? Not 'th^t 
the specific activity f 126 |!Shf iilw l 8t ' thou 8 h 
not, in practice, significant v nff£ r *' 9, 7* does 
competitive essays (se Fia i??h^ "nsrtivfty of 
activity of 3 H ma ; seZe ? *l:?J "? e . Cffic 
of competitive assays (e « Til? 
which rely on the use 15 Ik- ter . 0,d ho, ™ones) 
tope Se 01 th,s Particular radioiso' 

Specific Activities 



3 H: 



Enzymes: 



Chemiluminescent 
labels 

Fluorescent labels: 



I ' d " e 5 tab,e event/sec/7.5 x io« 
labelled molecules. 

,-K d M , !, Ct8ble ev6 "t/sec/5. 6 x 10« 
'"belled molecules. 
Determined byenryme . a m 

fon factor' and detectabiiity of 
reaction product. 

^detectable event/labelled mole- 
mrecufe eteC,ab,e even ^»ed 



The importance of background in 
non-competrtive immunoassays 

A second important factor governing the sensitiv- 
»ty of non-compemive labelled-antibody m- 
munoassays js the 'background' or 'blank' sien^l 
emitted .n the absence of analyte, since errTin 
the measurement of this signal is clearly a major 
determ.nant of the error in measurement of zero 



dose. Amonest contributor": tn th» u ■ 
signal are t he^' noise • of ,hf background 

methods, cosmic rav and nth- ? radlolsol °P'C 
components is essential for mL^T heSC 

concentrations depicted in Fip w am ' bod y 
from Jackson et at ml) Th,^ ( re P r ° d "ced 
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Figure 5. Assay sensitivity (represented by the standard 
dev.at.on of the zero dose measurement, o*). ptotted as a 

o"S,°U e d C ,n n r Ua,i0n 0< ,2be,led £mibod v <o^i. 
10 L/M) used in the essay, assuming different levelc of 
non specific bl d(ng of ,. be||ecJ . nijbod - ir * d 

nstrument background has been assumed in the comS- 
t.ons represented, this limits the ultimate sens^Tta 
able, regardless of the concentration of antibody used ) 



ower concentration is required to yield the same 
level of analyte binding, albeit with reduced 
"ivhy * ding ' thus '"creasing assay sensi- 

ln summary, the high sensitivity of non- 
competitive labelled antibody methods derives 
essentially from their permitted use of optima] 
concentrations of antibody which (provided non- 
specific binding of labelled antibody is low) 
are generally considerably greater than in com- 
petitive methods, not from the fact that the 
antibody is labelled. Labelled antibody methods 
generally fall in sensitivity as the concentration of 
antibody is reduced towards zero, ultimately 
yielding a sensitivity theoretically identical to that 

^wE 61 "? 6 methods (Rodbard and Weiss, 
1973) (Paradoxically, early exponents of labelled 
antibody methods, whilst claiming them to be of 
higher sensitivity, also concluded that their 
sensitivity was increased by reduction in the 
T^E ^ f ,abe,,ed ar «'body used (Woodhead ei 
1971). This incorrect conclusion— based on 
observation of effects on the slope of the 
dose-response curve— exemplifies the many falla- 
cies encountered in the immunoassay field stem- 
ming from confusion regarding the concept of 
sensitivity discussed above.) Finally it should be 



emphasized that maximization of the sensitivity of 
a non-competitive immunoassay generally i m p| ies 
the selection of reagent concentrations a^d oC 
experimental conditions such that the [analvte 
signal/background] ratio (i.e. sib) is maximized 
However, this simple relationship disregards 
statistical considerations which arise when the 
numbers of detectable events are very low and a 
more appropriate objective mav, under' these 
circumstances, be maximization of the ratio s 2 /b 
(Loevinger and Berman, 1951) 



Other performance characteristics of 
competitive and non-competitive 
immunoassays 

Non-competitive designs also display a number of 
other advantages deriving from the relatively high 
antibody concentrations on which they generally 
rely These include increased reaction speeds 

vulir»i??t " incubation tim «), decreased 
vulnerabihty to certain environmental effects 
(«hich cause variations in binding affinity be- 
tween antibody and analyte), reduced «n" in- 
dependence on high antibody binding affinity. 

Nevertheless a price has to be paid for these 
benefits; this includes the greater tendency of a 
large amount of antibody to bind molecules 
differing from, but with structural resemblance 
to, the analyte itself, implying a loss of assay 
specificity. This effect generally necessitates the 
use, whenever possible, of an 'immunoextraction' 
procedure using a second 'capture' antibody 
(usually directed against a different binding site 
or epitope') as shown in Fig. 3(b) This 
techn.que-the 'sandwich' or 'two-site* im- 
munoassay (Wide, 197l)-,hus potentially com- 
bines the twin virtues of ultra-high sensitivity and 
specificity (together with short reaction time) 
features of crucial importance in many diagnostic' 
situations (for example, in the detection of AIDS 
viral antigens). (Note, however, that the loss of 
specificity inherent in non-competitive assay 
des.gns implies that they are less readily applic- 
able to the measurement of analytes of small 
molecular size, which cannot be simultaneously 
bound by two different antibodies directed 
against different antigenic sites on the molecule 
Such analytes are generally more appropriately 
measured using 'competitive' assay methods.) 
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• "Development of ultra-sensitive ~ 
immunoassay methodologies 

The perception that the development of 'ultra- 
sensitive immunoassay systems (i.e. sterns 

Jty) depends on (a) reliance on 'excess reagent' or 
non-competitive' assay designs; (b) the use of 
non-isotopic labels displaying higher cpecific 
activities than commonly used radioisotope- c) 
the development of efficient separation Sterns 
(ensuring minimization of non-specific ant'ibodv 
binding, and hence of signal 'backgrounds') and 
(d) dual or multi-antibody analvte-recoenition 
systems (exemplified by 'sandwich' or two-site 
assays) to maintain/increase assay specificitv ha< 
formed the basis of our own laboraton'V ? " 
munoassay development since the earlv i'n m.w 
1970s (Ekins, 1978). This led 
immediate recognition (Ekins, 1979, 1980) of the 
importance of the in vitro techniques of mono- 
clonal antibody production pioneered bv Kohler 
and M.lstein (1975), which are current ,h e 
ubject of bitter patent disputes in the 'US A 
(Ezzell 986, 1987a,b), and which mav be 
expected m Europe. • 

Meanwhile, of the candidate labels for use in 
this context, both chemiluminescent and fluores- 
cent labels offer many attractions. The develoo- 
ment of stable, highly chemiluminescent acridi- 
mum esters by McCapra and his colleagues 
(McCapra et al., J977) has subsequently been 
exploited by Weeks et al (1983, 1984) and more 

[!«i by s r r3 ' COmmercia ' ki < manufaTtur 

SnS ■ W ° h3Ve US£d more conventional 
chemiluminescent compounds to label immuno 

1984 1985; Barnard etal, 1985). Yet others have 
rel.ed on enzyme labels to catalvse chem lu mo- 
nogenic (Whitehead era!., 1983) and nuoToSc 

DeiS'/w^-' • 980) reaC,, '° ns as indica *d ab°ve 
Detailed description of these various methodolo 
gies : ii presented by others in this volume and 
need not be duplicated here. 

Common to all the 'ultra-sensitive' immuno- 
assay methodologies^elying on such alteTaTe 
labels is their dependence on a non-competit U 
labelled antibody, assay strategy whenever 
appropriate; however, for the reasons indicated 
above, competitive methods continue to be 
generally employed for the measurement of 
analytes of small molecular size (e.g. therapeut* 
drugs, steroid and thyroid hormones etc 
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Nevertheless, the convenience (from » ma „ / 
'unng ^viewpoint, and for othe'r SSi'^SS 
of relying on standard labelling procedures has 

XSJ?' • ^'^ ^ theS£ Cases >^elled antibody 
techniques are increasingly preferred. Thoueh the 
commercial kits based on'these varies ?a bel 
differ to a minor extent in sensitivity, specificity 
convenience, etc., such differences are a east' 

po SS ibi,i,£ T^Sj^rT 
na.urallv included an.inn,ent of ,hTe n h«Sa 

wc-;,ed qu rv or s~-r«» - 

finr*}„A w e Ia ntnanide che ate* 

( ncluding, ,„ particular, the chelates of eu o 
pium samarium and terbium facilitate such 
development, possessing prolong nuorescSS 

tTZ"" P '" P <:"" 0f ,hc ''"thanide chel^,' 

nzed by a much smaller Stokes shift (-28 nm) 
and a fluorescent decav timo j *°"m;, 
spectrum which impl ? ha ™ is L'™?? 

oi .he '-.hanidelhcl^rS^^S™- 



...CHEMILUMINESCENT ANDRUOBESCENT MARKERS 



67 



measured in the presence of a fluorescence 
background (deriving from extraneous sources) 
which, in practice, approaches zero. Fig. 6 
illustrates the basic concepts involved in pulsed- 
light, time-resolved, fluorescence measurement 
which form the basis of the DELFIA immunoas- 
say system currently marketed by LKBAVallac. 

Though it is inappropriate to pursue this 
subject in greater detail, attention should also be 
drawn to the possibilities offered bv phase- 
resolved fluorimetry. This permits separate iden- 
tification of fluorophores differing in fluorescence 
lifetime by their exposure to a sinusoidal!}- 
modulated exciting light source, and observation 
of their demodulated, phase-shifted, light emis- 
sion (McGown and Bright, 1984). This technique 
offers the possibility both of the development' of 
homogeneous assays (relying on a difference in 
fluorescence decay time of bound and free forms 
of the fluorescent-labelled molecule), and of 
discriminating between two labelled antibodies in 
the context of multi-anaJyte 'ratiometric' im- 
munoassay as discussed below. 

'AMBIENT ANALYTE' IMMUNOASSAY 

Before proceeding to a discussion of the develop- 
ment of multi-analyte assays, another important 
concept, termed 'ambient analyte immunoassay' 
(Ekms, 1983b), must first be examined. This 
term is intended to describe a type of immuno- 
assay system which, unlike unconventional 



methods, measures the analyte concentration in 
the medium to which an antibody is exoosed 
being essentially independent both of sample 
volume, and of the amount of antibody present 
This concept is illustrated in Fig. 7, and relies on 
the physicochemically-based proposition that 
when a 'yanishingly small' amount of antibody 
(preferably, but not essentially, coupled to a solid 
support) is exposed to an analvte-containing 
medium, the resulting (fractional) 'occupancy of 
antibody binding sites solely reflects the ambient 
analyte concentration. Clearly the binding bv 
antibody of analyte results in a depletion of the 
amount of analyte in the surrounding medium 
but provided the proportion so bound is small 
(i.e. less than, for example, 1 % of the total), such 
disturbance can be ignored. (This effect is closely 
analogous to that caused by the introduction of a 
thermometer into a medium possessing a much 
larger thermal capacity; the temperature disturb- 
ance caused by the thermometer itself is negligi- 
ble and can, in these circumstances, be disre- 
garded.) 

The principles of ambient analyte assay derive 
from the recognition that all immunoassays 
essentially depend upon measurement of the 
•fractional occupancy' by analyte of antibody 
binding sues following reaction of analyte with 
antibody as discussed above (Figs 3. (a) a'nd (b)) 
The fractional occupancy of ('monospecific* or 
'monoclonal') antibody binding sites in the 
presence of varying analyte concentrations, plot- 
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Figur 6. Basic principles of pulse-light, time resolved 
fluorescence. Fluorescence mined by the fluorophor (typi- 
cally a europium chelate) is distinguished from background 
fluorescence, which decays more rapidly. 
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ted against antibody concentration, is portraved 
m Fig 8. The fraction of analyte bound is a'lso 
plotted in this figure. (Note: for the sake of 
generality, all- concentrations in this figure are 
expressed in terms of i/K, where K is the affinity 

Sn! 3 «i ° f the amibod y- *>r example, if K = 
JO UM, a i concentration of 0.1 x l/K represents 

>ti° x T^i." 0 - 1 x ]<r " x ><" 

10" = 6.02 x ]0 8 molecules/ml.) 

It should be particularly noted that, at antibody 
concentrations of less than ca 0.01 x i/K antibody 
fractional occupancy is essentially dependent 
solely on the analyte concentration in the 
medium, and is independent of variations in 
antibody concentration. Thi: reflects the fact that 
this concentration of antibody binds less than 
approximately 1% of the analyte in the medium 
irrespective of its concentration. This implies for 
example, that the introduction of 30. 100 or 1000 

b i n il!on°, dy of m0,e , CUleS , a medium containing 
billions of analyte molecules will result in each 

case, in virtually identical fractional antibody 
bindmg-site occupancy, ihe upper limit of anti- 
body concentration being determined bv the 
antibody affinity constant. (An antibody concen- 
tration of 0.01 x VK-is a hundred-fold' less than 
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that (1 x l/K) necessary to bind 50% of a 'trac*' 
amount of analvte (see Fie 8) r\»Z?Au V 
and Yalnw no™ J • t V\ c,aimed by Berson 
«na i alow (1973) as maximizing assay 'sensitiv- 
«y (..e. the slope of the dose^respoVe c Ur Ce 

56 COnc,usio " has subsequently 

S m ,nCO,p0ra, , ed im ° lhe mvthologv of 
radioimmunoassay design which reerettahlv » 
majonty of kit manufacturers continue' o ccl'pt ) 

The amhent analvte a«;av rnn~.«t "- e PV 

- panicular, be utilized 

immunosensors and immunoprobes. One such 
example ,s a probe for th e P measurement of 
salivary steroids that is currently beine develon5 

coated ' D &- C °? PriSi ^ 2 3 - o P d £ y d 
shaoe to \ v • T? COm P ara °'e in size and 
?n,ln,i J ' Cal therm °nieter, this device is 

intended to permit the measurement of saliva™ 

s™Jwe mith r rCqUiring thC ^ctio^f 
; alna. However, the concept also underlies our 
approach to multi-analyte immunoassay alo 
under development in our laboratory 



o 
c 

CD 

a 

3 

u 
o 
o 

* 
c 
o 

u 

CQ 



100.0 



• 10.0 




c 

3 

o 
-o 

c 

.2* 

c 
< 



0.001 



AAI 



Antibody concentration 



RlA 



r igure 8. Fractional antibody bindino.«;it* «~ 
different values of analyte (antigen) c^xS?Z ( £ p,otted 35 3 function of antibody bindino-site m 
:oncentrations are expressed in units o M/T k? ' '* The P ercenl3 9e bindinc of analvte to a j2 1 concentrat.on for 
rentage binding of analyte is <?%*•£ U^nl^ f °'- 8nlSbo * "ncenuLn^ Sh0Wn " A " 

oncentration extending over several ofderl * Z b'nd.ng-s.te occupancy is essentislk >un£mSfL (3 PP r °*'™ately). 
ther 'competitive' immunoassays si ^rin^- W ^ 9 ° Verned S0 ^ ^ A ^S^S^ " amib0dv 
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IS&TiSii^ 1 'RADIOMETRIC 
IMMUNOASSAY SYSTEMS 

The concepts relating to ambient analvte im- 
munoassay and assay sensitivity outlined above 
are both exploited in our present development of 
a random access muhi-analyte, immunoassay 
technology capable of measuring in the came 
small sample, virtually any number of individual 
analvtes from selected analvte 'menus' (e 2 a 
hormone menu, viral antigen menu, an allergen 
menu etc.) . Many examples of a need to measure 
a mult.phcity of different analvtes in the came 
sample exist in medical diagnosis, for example in 
the routine diagnosis of thyroid disease, where it 
is frequently necessary to measure a number of 
different hormones and thvroid-related proteins 
At present, clinicians frequently experience diffi- 
culty in deciding on the best sequence of te<t< to 
arrive at a correct diagnosis. Such problems 
would be overcome were all relevant analvtes 
measurable at a cost comparable to the cost of 
measurement of a single substance. Our own 
immediate objective is the development of a 
technology permitting the measurement of com- 

S^nL t? 10 "' Pr °? eS ' U£in £ a sin * le sma " blood 
sample. However, the need for 'multi-analvte' or 
random access measurement is not confined to 
medical diagnosis: ,t also arises, for example, in 
the pharmaceutical industry (where there exists a 

ensur f the P^ty of protein drugs 
synthesized by recombinant DNA techniques), in 
the food industry and elsewhere. Though still at 
an early stage, our approach to the achievement 
of this objective can be briefly indicated. 

Multi-analyte assay: general principles 

As discussed above, the notion of ambient 
analvte assay simultaneously introduces two 
extremely important and novel concepts: (a) that 
an estimate of anaJyte concentration can be based 
upon the use of an infinitesimal amount of 
sampling antibody, and (b) that such an estimate 
derives from a direct measurement of fractional 
antibody occupancy by analyte, irrespective of 
the exact amount of antibody used. It should be 
emphasized that the latter proposition is valid 
only in the context of ambient analyte assay and 
is not true in current conventional immunoassay 
systems (in which fractional antibody occupancy 
depends both upon the amount of antibody in the 
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system, and sample volume— see Fie 8} i„ t ^ 
exposure of a small number of .^2* 
cules (,n the form, for example, of a 'micro^" 
located on a solid support) to an 2£E? 
containing fluid results in occupancy of a „«& e y " 
binding sites ,n the microspot reflecting the 
-nalyie concentration in the medium. Foltow no 
such exposure, the antibody-bearing probe ^na? 
be removed and exposed to a ^dt^to^l 
olution containing a high concentration of an 
appropriate second antibody directed against 
either a second epitope on the analvte molecuTi 
this is large (i.e. the occupied site), or Gainst 
unoccupied antibody binding sites in the case o 
small analyte molecules (see Fig. 3(b)) (No* an 
antibody simulating antigen, indVeac ^witt 
unoccupied binding sites, is described as a 
mirror-image anti-idiotypic antibody'; the use of 
such an antibody instead of labelled antigen t 
convenient but not essential, and is sugfested 
here merely ,0 simplify illustration of the basic 
concepts involved.) c 
Subsequently, an estimate of binding-site occu- 

| P oc"ed° n the' SamP,ing ' (S0 "' d Phase > 
located m the rmcrospot may be derived bv 

measurement of the ratio of signals emitted by the 
rwo antibodies forming the dual-antibody 'coup! 
\tZn- ThlS " n be conveniently achieved by 
labelhng the 'sampling' and 'developing' ami. 
bod.es w,th different labels, for example, !p£ of 
radioactive, enzyme or chemiluminescem'ma? 

> a Flu °' escenl 'abels are nevertheless partial, 
lar y usefu , m this comext b£caus the Pa u ^ f 

optical scanning techniques, they permit arrays of 
different antibody 'microspots' distributed over a 
surface, each directed against a different analv e 
10 be ind.v.dually examined, thus enabling 
multiple assays to be simultaneously carried I om 
on the same small sample. Fig. 9 illustrates these 
basic ideas, and Fig. 10 such an array 



Microspot immunoassay sensitivity- 
theoretical considerations 

measuTet" * T " Prind P le ' P ossi °'e to 

measure an analyte concentration using a micros- 
pot of ant.body comprising a number of am body 
molecules in the range ca lO'-lO* is likely at fim 
sight .0 appear surprising, and may \deed 
provoke scepticism regarding the assay se^nvi-' 
nes potentially attainable using this ^oach 
Clearly a number of factors, such as the sensitivity 
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Figure 9. Besic principle of dual-label, ambient-analvie ^ 
end 0 fluorescent photons emitted reflec TxlTPL^PT?***! re ' v,n 9 on ""orescent labelled antibodipc tk , 
concentration to which the probe has been JrS«£ . ,£ee Fi S s 5 «"(» 6) and is solelv dp " h f ! rat '° of * 
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•gure 10. Multi-analyte' antibody array P aeh a „, K M 
mcrospof represents a Vanishing sma «- Lo ?° d ? 
nt.body directed against an individual analyse ' °' 



an.ibodv binding si «s ° f sensi "S 

l.v) coaled »i* S^^S- '^ito™- 

ml- The molernl«; ™ ncen,ral,on ^ molecules/ 
the C m ° leC "' ar conc entrat,on of antibody in 

Iht ' thuS g,Ven b - v ADlv - (Note- the fact 

that anubody is situated on the surface nf 7 . J 
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-Surface area 

>1 A sq.^m. 



-X* A ntlbody density 

molecules' sq. ^w. 

/^AntibodV affinity X 

V * ' KUM - m? 



Avogadro's number: 

N molecules/M 



Figure 11. Microspot embient-anelyte immuno£c Sav tk* m>r rtc ™. *k 

though if the dual-labelled wtibody «> b. uniformly coated with antibody. 

fluid volume for ambient anaMe assay condSoM U . ° m COS,ins is not essential. The 

M.mmum test sample volume (M/S): A x D x k > io°/ W 9 P "° n 0f the ™>o™™ approach) is shown 



given by the equation: 

F 2 - F(l/q + p/ q + ]) +p/q _ q (]) 

where p = analyte concentration, q = antibody 
concentration (both expressed in units of MK) 
Thus, for antibody binding site concentrations 
0 (i.e. q < 0.01), Fmp/(l + p); ( see Fig. 8). 
Likewise the fraction of anaJvte bound bv 
antibody (/) at equilibrium is given by the equation: 

f 2 - f(l/p + qlp + i) + qlp . 0 (2) 

Thus, for analyte concentration 0 (i e n <r 

°-; i >^- «2J + < see «)• Furthermore 
when q < 0.01, and when p =s 0, / < 0 01 

Expressed in units of VK; the concentration (a) 

in the assay of 'sensing' antibody situated on the 

microspot is given by DAK/(v x 6 x i&<>) ( sjnce 

Avogadro's constant, expressed as the number of 

molecules/mmol, is 6 x l(p (approximately)) 

The fraction of an analyte concentration -» 0 

which will be bound to the spot is therefore 

DAK/(v 6X](P + DAK), implying that the 

number of analyte molecules bound to the spot is 

given by vCDAK/(v x 6 x 10 20 + DAK). 

Case 1: sandwich (two-site) assay. Following 
incubation of sample with antibody, we assume 
the sample is removed, and the microspot then 
exposed to a volume V(ml) of a solution of a 

S A COn ,f , n!^ e,,ed ' ' deve,0 P in g' antibody of affinity 
A (UM) at a concentration given bv O 
(expressed in units of MK'). 



The fraction of analyte bound by labelled 
antibody (P) a t equilibrium is given by the 
equation: J 

f 2 - r(VP + Qip + 1} + QIP = 0 (3) 

where P represents the analyte concentration in 
the developing-antibody solution, expressed in 
units of VK', i.e. vCDA KKVUv x 6 x 10 20 + 
DAK)V x 6 x 10 20 ]. 

Assuming P < 0.01, P *= 0/(1 + Q). (For 
example, if £? = 1, the fraction of analyte 
molecules bound by labelled antibody = 05 
approximately). Thus, since the number of 
analyte molecules bound to the spot is eiven bv 
vCDAKKv x 6 x 10*> + DAK), the number of 
analyte molecules labelled by the second de- 
veloping, antibody is given bv vCDAKQ/Hv x 6 
x 10 2n + DAK){\ + Q)l and the surface density 
of such molecules is given by vCDKQ/Uv x 6 x 
10 20 + DAK) (1 + 2)1. Moreover, assuming that 
DAK J v x 6 x io* (i.e. that the amount of 
antibody in the system is such that 'ambient assay' 
conditions prevail, then the surface density (£>•) 
of developing-antibody molecules = CDKQ/U6 
x 10- )(i + g)] approximately. It should be 
noted that D* is independent of both v and V 
also that the ratio DVD = C x KQ/[(6 x KPvi 
+ Q )] = C x constant. 

If the minimum detectable surface density of 
developing-antibody molecules (i.e. the 
standard deviation of the measurement of D* 
when C = 0) is given by D; in (molecules/um 2 ) 
ana <-mi n represents the minimum detectable 
analyte concentration in the test sample, then. 
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disregarding non-specific binding of developing 
antibody within the microspot area, 



r = n* 



min 



x[(6x JO^Q + G)]/Z>tfQ(4) 

F °\Zn?Z ] ?' if 2 = ] ' D = 105 ™lecules/um 2 , K 
= 10" L/M and = 20 molecules/um* then 
t-min = 2.4 x ]0° molecules/ml = 10" ,5 M/L It 
should be noted, in this example, the fractional 
occupancy of the sensing antibody binding sites 
by the minimum detectable analvte concentration 
is 0.04%. 



Case 2: anti-idiotypic antibody ('competitive') 
assay. In this case, we assume that, following 
removal of the sample, the microspot is exposed 
to a volume V(ml) of a solution of (for example) a 
second, labelled, anti-idiotypic antibody reacting 
with unoccupied sites on the sensing antibodv 
using similar reasoning as above, we ma'v 
likewise assume that the fraction of such <ites 
*hich become occupied by the anti-idiotvpic 
developing' antibody is given bv Ql{\ + n) 
vhere Q is the developing-antibody concentra- 
lon. However, the minimum detectable surface 
lensity of anti-idiotypic antibody is not, in a 
:ompetitive design, the critical determinant of 
issay sensitivity; this parameter is essentially 
:overned by the precision of the density measure- 
nent. 

From Eq. (1), the fraction of sites unoccupied 
y analyte = 1/(1 + p ), and the fraction occupied 
y anti-idiotypic antibody = Q/n + p \n + q\ 
hus, if the CV in the measurement of anti- 
iiotyp,c antibody is e, the standard deviation is 
2/(1 + p){\ + Q). This term also represents the 
D in the estimate of the fraction of sites occupied 
■• analyte. Since the total number of antibodv 
nding sites in the spot is DA, the SD in the 
timate of occupied sites as p -» 0 (i e oD') 
•proximates tDAQl{\ + Q)- the SD in 

555?? i 2? surface-density estimate is thus 
? ( T the SD in ,he measurement of 

ictional binding-site occupancy when p — 0 
»nes D mjn , and hence the minimum detectable 
alyte concentration in the test sample as 
heated in Eq (4). K 
ITius 

ni» = ^min * [(6 X \&°)(\ + Q))/ DK Q (5) 

- tDQI{\ + 0 ± [(6 x I020 )( i + Q)] 

DKQ (6) 

= tIK x (6 x 10 20 ) (7) 
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For example, if values of Q = \ n = in* 
molecules/urn 2 , and K = 10" UM are'assumed as 
m the non-competitive example considered 
above and the CV in the measurement of 
anti-idiotypic antibody density in the microspot is 
1% (i.c. z = 0.01), then D mi = 500 molecule^ 

ff-'3M/, d r min = ? X 10 ™'ecules/ml = 
.miivU^- *«*ional occupancy of the sensing 
antibody binding sites by the minimum detectable 
analyte concentration is, in this example. \% It 
should be noted that the sensitivity limit of t/K 
(expressed ,„ molar terms) is identical to that 
previously established for conventional 'competi 

w V hth 3SS T i^T 3nd Newman ' W0) C 
which underhes the predictions represented in 

Such considerations appear to suggest (a) that 

S aS h say Sensitivities su P^r to those 
obtainable by convent.onal radioisotopic^ 
based immunoassays are achievable, and (b) that 

aL^r rr. lded ^ n0n - com Peti«ive micro spo 
assays are likely to be considerably greater than 
those of corresponding compel microspot 
assays 1, must be emphasized, however, thaT 
though such predictions are likely to prove 

orthe'.^r^T 5 '^ 3 ^*^ the Performance 
of the labels and signal-measuring instrument 
used are ^incorporated in the simple theoretic" 
analysis d.scussed above. Such factors are clea Sy 
of importance in determining overall microspot 
immunoassay performance. F 



Practical implementation 

The concepts discussed above are clearly exploit- 
abk using a vanety of antibody labels, including 

aJv stuXrr" 1 la K be ' S; !! 0WeVer « our P re,i ™ 
ary studies have been based on the use of 

conventional fluorophores, since the technology 
of simultaneous measurement of dual fluoresc- 
EeL r ° m . smal1 areas is already well established. 
Because this volume centres on chemiluminesc- 
ence, we shall provide only a brief indicaHon of 
our initial experimental work in this area, which i 
currently based on the use of commercial lv 
available confocal microscopes. ommerc,a "y 

Instrumentation: the laser scanning confocal 
m.croscope. In laser scanning confoca* ZnS 
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ence microscopy, a small area of the specimen is 
illuminated by a focused laser beam; the fluoresc- 
ence photons emanating solely from this area are 
in turn, focused onto a photon detector. Both the 
intensity of illumination and the efficiency of light 
collection diminish rapidly with distance from the 
focal plane (Fig. 12). At the 'confocaP point the 
projection of the illumination pinhole and' the 
back-projection of the detector pinhole coincide 
Such systems contrast with conventional epi- 
fluorescence methods, where the specimen is 
exposed to an essentially uniform flux of illumina- 
tion (White ei al., 1987). 
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Sensitivity of current instruments. Typically 
fluorescence photons emanating from the laser- 



Laser 




Objeciivt 



Object 

in Focal Plane 
not in Focal Plane 



£h. k w h 0 ? 6 ° f heconfo « l microscope. Illuminating 
Si ,k 6d "■ a , P0 ' nt the ,ocal P |ane - Reflected light 
rom th.s po.nt ,s focused onto a detector. A complete 
two^imens.onal .mage of structures within the focal plane is 

h?«o?l by SCann * n9 the Se ' eC,ed area of inte '«t. ^d may 
be stored in a microcomputer for video display 



emitteAy the £^^2 
contribute lo the background JX Tl 

f ^e Z. ^^Z^IZS^ 

already display very high sensitivity o? 
of fluorescent signals. For example, the confocal 
microscope manufactured by Zeiss is claS t 
display a lower detection limit fl o ™ i° f 
about ten molecules/um 2 (Ploem iSuFSiS 
commercially available FITC-labSled Eg atufn 
a fluorophore/protein molar ratio of J|- th u the 
detection im t (D* ^ nf tK» 7 • . ' s ine 
-2-3 FITr lliUrT . Ze,ss m 'croscope is 
imn.tc ? d,ed , * G m °'ecules/um 2 . This 

01 . « x ju molecules/ml for a two-site assav 
assuming the same parameter values a used fn 
the examples discussed above, or 2 4 x : iS 
modules/ml using a 'sensing' antibody of affinity 

R a H/T he \ COmparab,e instrument is the Bio- 

scone which^ l3Ser SCannin S Confoca ' 
scope, which we are currently using in the 

development of 'ratiometric' multi-analyte as 5y 

methodology m accordance with the orincS 

outhned above (see Fig. 13). The argoS las"? fa 

this system possesses two excitation hnes at %S 

and 514 nm. It is thus particularly effic "n foMhf 

sucn as FITC (which displays an excitation 

lluor^o^ such' a Z'ITt 
imum 5 ,6„m>. Howe"" the 
munoassay prmcple permits considerable varia- 
tion ,n detection efficiencies of the two label* 

i^-h J ed antlb< "iy species formint the 
signal ratios in the region of unitv TW 
mefficency of the argon laser in excL £ red 
em.tting fluorophores is not necessarily a Lior 
hand.cap ,„ the present context * J ° r 

Though the current Lasersharp instrument 
relies on a conventional microscope rather th™ a 
purpose-designed optical system (and appear! t0 
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sfiape and size for optimum adjusimem to .h. 
space program e,c, such rechnob y Kbe £ 



Beam 
Splitters 



Objective 



Antibody array 



Am (body microspot 



Figure 13. Dual-channel confocal fluoresppn™. 
permitting simultaneous measuVe^ 
signals from two fl«o^^ 

scanning the antibody arrav the rati^ 1* I , I po,nt - B V 

•^^WdLii 0 ' s,gna,s ,rom each 

be less sensitive), it permits quantification of 
fluorescence signals generated from m crosoots of 
selected area. Initial studies have reveled fha f 
under conditions that are not necessarih Optima ' 
the instrument is capable of detecting °f UmaU 
im-icly twenty-five T^M$lg£& 
W 2 , scanning an area of -50 m * (F?e 1?) u 
must be stressed that neither of these confL 
m.croscopes are designed ^cri^br^SS 
ranometric multi-analyte immunoa say use and 
" can be anticipated that future iLmm/m. 
constructed specifically for this punU SK 
to prove both cheaper and more sTnsitive * 

Jher instruments. The MPM 200 Micros™ 
>hotometer manufactured by Zei« 



Solid antibody supports. On the hack «f .k 
theoretical considerations discussed «E he 

proxies (for e^mole^ iKif E£? 
eSine^r^Lrr"^ 3 ! 8 ; 00 ^- We h ™ 

si g „ a ,-,o-„oise ra,r^d nCt haTt re C 
bee d „ e p r ov,s,o„ a ,, y used „ our 

immunological activity (Fig. appear ,0 reta,n 

«2 Ou 1 ?? ° f ,he ' ra,iomel "C imunoassay co„. 
cept. Our primary intention, in initial studies 
been establishment of the basic cond t ons wh£? 
using a particular instrument, can be ant dMiid 
on theoretical grounds to vielH tf L P ed 
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thought it useful to confirm the validity of our 
general concepts by comparing the performance 
of certain assays when constructed in microspot 
format and when conventionally designed. For 
example, we have compared a dual-labelled 
tumour necrosis factor (TNF) ratiometric assay 
system using Texas red and FITC-labelled anti- 
bodies with an optimized IRMA system using 
identical antibodies but with the second antibody 
l25 I-labelled. Although unoptimized, the 
ratiometric microspot assay yielded formal sensi- 
tivity values closely approaching that of the 
conventional, optimized, IRMA. Although 
verifying the general concepts underlying 
ratiometric microspot immunoassay methodolo- 
gy, further work is required to achieve the 
considerably greater sensitivity that theory pre- 
dicts as achievable using optimized reagent 
concentrations and improved instrumentation. 



CONCLUSION 

As indicated above, differentiation of the fluores- 
icent signals yielded by two fluorophores can be 
ireadily achieved solely on the basis of wavelength 
differences, and this approach has been relied on 
entirely in our preliminary studies. However, 



other physical techniques exploiting differences in 
decay time of two or more fluorescence emissions 
(using, for example, a pulsed or sinusoidally 
modulated laser source, and time- or phase- 
resolving detectors) are available, and can be 
expected both to further reduce background and 
to improve signal resolution, thus increasing assay 
sensitivity and precision. These considerations 
aside, the basic technology involved closely 
resembles that employed in domestic compact 
disk recorders and other similar data-storage 
devices, the obvious difference being that light 
emitted from each of the discrete zones forming 
the antibody-array is fluorescent rather than 
reflected, and yields chemical rather than physical 
information. Indeed, our preliminary studies 
suggest that highly sensitive immunoassays using 
antibody microspots of surface area approximat- 
ing? ,r~ e achievable > im P>y'ng that some 
2,000.000 different immunoassays could, in prin- 
cipte, be accommodated on a surface area of 
1 cm . Though non-specific binding of a multiplic- 
ity of developing antibodies would probably 
prohibit the use of antibody arrays of this order it 
is ev.dent that the technology is capable of 
encompassing analyte numbers of the kind likely 
to be useful in practice. 

The development of multi-analyte assay sys- 
tems of this kind can be anticipated to bring about 
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fundamental changes in medical diagnosis and 
many other biologically related areas. Systems 
capable of measuring every hormone and other 
endocrinologically related substance within a 
single small sample of blood are within technolo- 
gical reach, providing data which, when analvsed 
with the aid of computer-based 'expert' pattern- 
recognition systems, are likely to reveal endoc- 
rine deficiences only dimly perceived using 
current 'single-analyte' diagnostic procedures. 
Such systems also provide a means to the 
development of a 'random access' immunoassay 
methodology, permitting the selection of any 
desired test or combination of tests from an 
extensive analyte menu. Clearly the accommoda- 
tion of a wide range of individual immunoassays 
on a small immunoprobe (comparable in its 
overall physical dimensions with a few drops of 
blood) is likely to totally transform the logistics of 
immunodiagnostic testing, and genuinely repre- 
sents, in our view, 'next generation' immunoassay 
methodology. 
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Throughout the 1970s, controversy centered both on im- 
munoassay ■ "sensftivity" per se and on the relative sens- 
J^tles of tebeted antibody (Ab) and labeled analyte met* 
ods. Our theoretical studies revealed that RIA sensitivities 
could be surpassed oniy by the use of very high-specffic- 
actrvrty nonisotopic labels in ••noncompetitive- designs 
preferably with monoclonal antibodies. The time-resotad 
fluorescence methodology known as DELFiA-developed in 
collaboration with LKBWaJlac-represented the firsi am- 
meraal ultrasensitive'' nonisotopic technique based on 
these theoretical insights, the same concepts being sub- 
sequently adopted in comparable methodologies relyino 
n the use of chemiluminescent and enzyme labels How- 
ever, h gh-spedfic-actrvfty labels also permit the develop- 
ment of ■'muftianaiyie" immunoassay systems combining 
u^itrvrty with the simultaneous measurement of tere 
SIS? thousands of ana| ytes in a small biologies 

unexplorted, physicochemical concepts. The first is that all 
immunoassays rely on the measurement of Ab occuoancv 
by analyte. The second is that, provided the Ab concentra- 
bon used is "vanishingty smaB," fractional Ab occupancv is 
independent of both Ab concentration and sample volume 
Thrs teads to the notion of "ratiometric" immunoassay' 
involving measurement of the ratio of signals (e.o fluores- 

?ri^?. n A a S , . emltted * ^ ,abe,ed & fS7a 
sensor- Ab) deposited as a microspot on a solid support, 

the second (a "developing" Ab) directed against efther 

ocwpiedwunoccupiedbinding sites ofthe sensor Ab Our 

pMimmry studies of this approach have relied on a 

duakhannel scanning-laser confocal microscope, permit 

2 m,a K° SP< ? * area m ^or less to be anSyS, 
and ^.rylymgthat an array of 10- Ac-containing microspots 
each directed against a different analyte, could, in prin* 
pie, be accommodated on an area of 1 cm 8 . Althouoh 
measurement of such analyte numbers is unlikely ever to 
berequired, the abStty to analyze biological fluids fora wide 
spectrum of analytes is likely to transform immunodiagnos- 
tws in the next decade. y . 



Additional Keyphnues: ratkmetric Immunoassays . scarwiAw 

^Immunoassay and other protein-binding assay meth- 
ods based on the use of radioisotopic labels have played 
a major role in medicine during the past three decades 



7W utihty and importance have derived primarily 
from the structural specificity of many reactions be. 
£een bmdmg proteins and analytes and the detectabfl. 
^ of isotop,cal]y labeled reagents, the latter endowing 
such techniques with "exquisite sensitivity - Recent? 

topic toques based on identical analytical princi- 
ples, duTermg only in the nature ofthe marker used to 

« T" 1 (e - g " or ^tigen), whose 

d^buhon between reacted ("bound") and unreacS 
Z?' fract,onB constitutes the assay "response " 
The basic aims underlying this interest can be 
broadly classed under four main headings- 

•avoida^oftheenviromnental.legal.ewnomic.and 
practi^sadvantages of isotonic technique* (e.g., lim- 
ited shdf life of isotopically labeled reagents, problems 
of radioactive waste disposal, cost and complexity of 
radioisotope counting equipment), particularly those 
m^ding the development of, for example, simple diag- 
nostic kits /or home or doctor's office use; 

• achievement of greater assay sensitivity; 

' "J?*" measurement of analyte concentrations by 
use of transducer-based "unmunosensorB"- 

• simultaneous measurement of multiple analytes 
Cmultianalyte assay"). J 

In this presentation I will focus primarily on the last 
of these objectives, using this to set out the principles 
"? "tt-apto to develop a new "min- 

iaturized technology that will permit the simultaneous 
measurement of an unlimited number of analytes in a 
small biological sample such as a single drop of blood. 
However, retention (and, if possible. improvement)of 
the high sensitivities of conventional isotonic tech. 
niques is a basic aim not only of our own studies in this 
area but also of most other endeavors falling under the 
above headings. It ie therefore appropriate to preface 
this paper with a discussion ofthe general principles 
underlying the attainment of high binding-assay senaf! 
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Immunoassay Sensitivity: Some Basic Concepts 
Definition of Assay Sensitivity 

The need to establish assay conditions yielding max- 
unal sensitivity underlay the independent construction 
of mathematical theories of immunoassay dtS^S 
both Yalow and Beraon (J) and Ekins et aJ. 02) in the 
course of tie ongmal development of these methods in 
tie early 1960s. Regrettably, these theoretical studies 

led to a prolonged controversy, arising largely from the 
conflicting concepts of "sensitivity" adopted by^t™ 
groups (see Figure 1). Briefly, Berson and Yalow^ 
Uieir many publications relating to immunoassay de- 
sign (e.g.. 1, 3), defined sensitivity as the slope ofthe 
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Rg. 1. The differing concepts of sensitivity and precision undertone 
radioimmunoassay design theories developed by (toft) Yalow and 
Berson (e^» f . 3) and (right) Bans et al. [2. 4) 
Yalow and Banon define assay A as more sensitive because it yields a 
Z^Z^^JZZ ain '«!i* fc » -»* B a. morasS 
«!S^!J2?2S^ "»asur.mentof zero dose <<r 0 ) is less. Vatow and 
Beraonftewae define an assay system as more precise * it yields s steeoeT 
response curve whan data are plotted on a loo dose ££ nynmt n**™ 

response curve relating the fraction or percentage of 
labeled antigen bound (b) to analyte concentration ([H]) 
In contrast, Ekins et al. (e.g„ 2, 4) defined sensitivity as 
the (imprecision of measurement of zero dose, this 
quantity being indicative of, and essentially equivalent 
to, the lower limit of detection. 

The key difference between these two definitions 
clearly lies in the dependence of the assay detection 
limit on the error (imprecision) in the measurement of 
the response variable. By neglecting this crucial factor 
the "response curve slope" definition leads to many 
obvi us absurdities. For example, plotting conventional 
RIA data in terms of the response metameter B/F (i.e. 
the bound to free ratio) suggests that assay "sensitivity'' 
is increased by increasing the antibody concentration in 
the system; however, the converse conclusion is reached 
if identical data are plotted in terms of F/B'(see Figure 
2). Observation of the shape and slopes of response 
curves without detailed error analysis thus constitutes a 
totally misleading guide to optimal immunoassay de- 
sign. This approach has, however, characterized many 
of the studies conducted in the immunoassay field dur- 
ing the past 30 years, and has been the source of much 



mythology. For example, consideration of the Law of 

^ * 0n J r T eal8 when curves corre- 

sponding to different antibody concentrations are plot- 
ted in term .of b vs Bft the maximal slope at aero dose 
is obtained for a concentration ofCStfT (where K is the 
affinity constant), in which circumstance the zero dose 
response (ho) is 33%. This conclusion led to Berson and 
Yalow'B enunciation of the well-known dictum (which 
albeit erroneous, is broadly adhered to by many immu- 
noassay practitioners and kit manufacturers) that, to 
maximize RIA sensitivity, the amount of antibody to use 
in the system is that which binds 33% of labeled antigen 
in the absence of unlabeled antigen (1, 3). 

Disagreement regarding the concept of sensitivity 
inevitably led to prolonged dispute regarding immu- 
noassay design (5). However, although it is still common 
to encounter publications in the field that rely solely on 
the response curve slope as a measure of sensitivity, the 
assay detection limit is now widely accepted as the only 
valid indicator of this parameter, and we do not there- 
fore intend to dwell further on this issue here. It is 
nevertheless relevant to an understanding of the "min. 
iaturized" assay methodology described below to empha- 
size that untenable concepts of both sensitivity and 
precision underlie many of the commonly accepted rules 
governing current immunoassay-design practice, some 
of which are contravened in our own approach. 




Response curve slope Detection limit 



Rfl. 2. Schematic representation of RIA dose-response curves 
observed tor high and (low antibody concentrations plotted in terms of 

(teff) the fretTbound fraction (F/B); (center) the bounti/free fraction 

N«« the low antibody concentration yields a response curve of greater 
•tope whan the atoay response is plotted in terms of F/B. tWkZr s^ooe 
!»totedh frm, of BVF. Th. pn**on oi m^ram^t V£o <£Z 
(f Do) «J Independent <* the eooidinata frame used to plot assay data (sea 



Basic Immunoassay Designs 

It is likewise important in the present context to 
comprehend the basis of the various types of immunoas- 
says currently in use, and the constraints on the sensi- 
tivities of which they are potentially capable. The radio- 
immunoassay and analogous protein-binding assay 
techniques originally developed for the measurement of 
insulin by Yalow and Berson (6), and of thyroxin and 
vitamin B 12 by Ekins and Barakat (7, 8), relied on the 
use of a labeled analyte marker to reveal the products of 
the binding reactions between analyte and binder (Fig- 
ure 3, left). This approach has subsequently often been 
portrayed as relying on "competition" between lab led 
and unlabeled analyte molecules for a limited number of 
protein-binding sites, such assays being frequently re- 
ferred to as "competitive." 

Subsequently, Wide et al. in Sweden (9), followed 
shortly by Miles and Hales in the UX (10), developed 
labeled antibody methods (Figure 3, right). These meth- 
ods represented an extension of the "labeled reagent" 
methods (utilizing radiolabeled organic compounds such 
as I-labeledp-iodosulfonyl chloride, [ 3 H]acetic anhy- 
dride, and other similar reagents) devised, during the 
early 1950s, by Keston et al. ill), Avivi et al. (12), and 
others for quantifying amino acids, steroid and thyroid 
hormones, etc. Although radiolabeled antibody methods 
(immunoradiometric assays; irmas) were originally 
claimed (13) to be more sensitive than methods based on 
the use of radiolabeled analyte, these claims were sup- 
ported by neither rigorous theoretical analysis nor per- 
suasive experimental evidence, and for some time re- 
mained controversial. Further doubt on their validity 
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maJytc + mubody* « analyte : antibody* (B) 

— - .- - - - * srsid&ai aailytt 

-MtaduiJ antibody* (R 



Measure "fraction bound" (B) 
Measure •fraction fite" (F) 



fAb], -» 0 



of an ^ 

•"■J*, in te binding ctm&m»vi3+v*7*^^ to unlal) «*<3 

•» optimal antibocy eonenmarZiafrSi^ 

■ywn. tends tow^o 2^T oS^f^^^l* 6 ? 8 *^ *> «*h a 

pwlua. of the bimfnp f«^bS^^JS? oay ^**«^ to reveal the 

Infinity when the bound frsawnfc ^J^"!:^ ^ 
beefcflwurto) ownunea (Hcews* assuming ZS rc 



SSSi^^r fift?*" 1 b * ^bard and Weiss in 
f ^ Ied , the0retical demons^ 
that both labeled analyte and labeled Mtibody ineSS? 
Possessed essentially equal sensitivities flX ?TW 
authors suggested that irmas might be n5^££E 

mowporabon into the antigen molecule was restrict 

rically acre sensitive than ihi^JZ^ ^1 

^^^^^^ 
not really a consequence of the labeling LbSy ^ 

between labded-analyte and labeled^tibody" JS 
Averts attention from the true reasons 3"l £ 
superior sensitivity of certain assav d^W -r? , 
analysis (see, e.g 4 ,5) reveX 
feet- separation of the products of the binfr^Le&L 

^FlEf"** ° fb0Und ^ Set) tne 
optimal antibody concentration (for masiinal 

»ty) m a labeled analyte immunoassavin^k?^ J" 

to zero, irrespective of whether^S 

analyte fraction is measured, wTe^^tS^ 

body methods the optima] antibodv^L!!?,? 

Pends on which hb^^^S^S^T t 

measured, the optima] concentration also tends-to sere? 
f aversely, if the analyte-bound fraction is m^' 
* the concentration tends to infinity. Ia short, of the foJ 
basic measurement strategies available-la£led aT 

u*e of antibodv concentrations a DB roachin B i££f. 
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«- in the^nrcontS: t^fct 1^ 
labeled reagent of any kind is involved 

^nal assay sensitivity, itself 
mg reasons for the existence of this divert 
desigr*, and may thus be misleadmg^ ^£ 
may be more readily understood if thebaic ^TncinW 



The-Antibody Occupancy Principle" of Immunoassay 



fAbAgl/[£Ab]«J?IfAg] 



(1) 

«« fractional occupancy of antibody binding sites, 

IAbAgJ/tAb] » K[£AgV(l + K[fAg]) (2) 

where [AbAgJ, [Ab], [fAb], and [£Az] renn^nf *k 
concentrations (at equilibrium) tf £23 

JW. and fractional -^f^g J 



(AbAgMAbJ = K[Ag]/(l + K[Ag]) 



(3) 



Assays utilizing this concept have been termed -«m 
bient analyte immunoaaaa^^, fra^iaTX^ 
being independent of both sample volume^H^SS* 
concentration (see below). aDbbo < 1 y 

m^l^^Z r Dtially depend « —ire. 
ment of the "fractional occupancy- of the senaor anti- 

feLte3 re^n^nt ^ ^ 
antibody bina^^s^tes^ (fromwl^ch^ai^^^ ^occupied 
^atiyZuced ^ub^oTn^^S 
attiunment of maximal senBitiviry-tte^Tof atn^I 
antibody concentrations tending^ ^ 
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■"o^WMPrrmve immunoassay- 



may therefore be cateffon»w4 « 

verady, techniques tttj^JSES 

measured permit On principle) Ih. » iSSSS 

between two la^uStiti^^ ***** 
These concepts are illustrated in Fieur* s »w v 
portrays basic immunoassay fnrm.r flgure , 6 « wh "b 
mon use. Convention^ STiKK * «n. 

^yte-tecbauquea^^^^ ^led- 
binding sites, generally by b^TJ*?"™ CUpUd 
taneous or sequential] I withl^iS* t,on , {e,th « -haul- 
idiotypic «5SSft£5f wftT^' ** anti - 
on the sensor ^0^^^^ 
Purpose. In the case of .LZi? ii?f?J 0r £ e J 8axne 
says, the labeled antib^teeS w£f ^ 
"tibody; after mctio^i^^fS^ ^ 
body may be separated into S2S^" eM0r 
fractions through use of (ee) atX? ^occupied 
prising antiacn^T " !' 1 "^^osorbant (com- 

one measures the label*) o««k^T , Uonver «Iy, if 

a., a* -Jft^S^^*-^ 

assay is "competitive.* ""™™°"on>ant), then the 

Two-site "sandwich" assav* nr. *u 1 
because they rely on CSS^ST 
ered from two points of view F^Z^f ** COnfiid - 

the -lid-pha^^tibody^ U 
antibody, with th* l«K.i2r!!.f e ! arded :^ «nsor" 
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component (if any) of the r*o^ 

i. lb. „f t^^^JL'"^ 

sore," no component is >«k*iZT uwrH,a f efl ramunosen- 

on whether a bSJ^ ^ Ke^K^ 0 ^ 
or unoccupied antibody bS siS sl^ 1 ^ 
surface. In short th. . lea «™ated on its 
Petitive" mere£ reflet "^Petative" and "noncom- 

eites and lead to ^eSTthl ^ 
required TS^tffXSS^ 
dom errors ansing in the dSaSltal 
Compebtave and noncompetitivrl™ 
be shown to differ sitmifir^H, ^unoasaays can 
mance chsracS^^^ rf P* 5 ^ 
both types of assays hot! SS? . 8ensitivitie «- I" 
antibody and the ^a^^^^^^ 

. — , ^ ^ nBMmd , tant in detorniiiu^ 6e Litr^S k label ^ 1In P 0 » , * 

antibody, with the labeled anticXnlbW^^' Ae ^^^^Sve^v^ 6 ^ iD , Pnc **> 
Pied sensor^ntibody bindinf site? u ft ^ • < S' by the affinj > "^fortTantZ^^ ^ 
Seen from this viewpoint* £l£ specific activity of the liW^^^ ^ 

competitive syJtems. bti e^t?^ » 
or "manipulation" error in M^' e*penmental' 
«rc-dose response £75 . t? me asurement of the 
arising from p^trS 1^' 2? relaUve C,TOr ( «VRo) 
Eluding the ? J&tSZZ^^ 



Seen^ STS^^ ^^hed. 
olaasedasrnoiMompetitive " 8&Says ^ be 
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ae] is of key importance in determining «potantia]" 

ing the specific activity of the label to be infinite 
implying aero error in signal measurement). Thus the' 
potential sensitivity of a competitive aaaay can be 
ahown tobe wbereaatfct of a noncompetitive^ 

aaaay la given^y WAbJSR, where, in thflaS 
case, ^assumed to represent the labeled antibody 

nonspeofically bound" antibody. Thus JUAbl - f tZ 
fraction of labeled antibody Lt is ^c^Sy 
bounded Roa^AblCTo = f^. As^2 
tiiereUtive error (o^ in th7 measurement of£e 

S^J^T 6 " Epproximflt€] y "entical for both 
ampebbve and noncompetitive assaya, it is evident 
from thia simple analysis that the potential BenSv 
of noncompetitive methods is greater thJtwS- * 
petitive methoda by the (J^7£%J^£ m f 
fcbeled antibody that is to^X 
example, if the nonapecificaJly bound fraction is 0 01% 
a i^competitive strategy is potentially capable' of a 
-fnaAvfty 10 000-fold greater than thai t of fcomDeti 
tive approach, other factors being equal 
These i findings are summarised in Figure 6 (left) 

pressed in terma of molecules per milliliter) and anti- 



Conipelilive Non-compeiiiive 



RioIecuJes/mL 




-US ELI5A 
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Implying «M th. m m 0» mtu^L It^Ly*** 

*»«*•). «d» touts in (MSlfctoln t» JhEifc m ?^* ehl ~« ,fc > 

~^»i*w«a£eE53^^ 

(MPB^ oah) vd ooin ftev oin^TZi^vX^r*"* antibody of 1% 

ZSrHS ^JZI^SSL^^l ■*« • ***** 

nonoomp.mfct bnmunoaum tw^on'-iiST, !? 1 5* v,, ** "P 0 "* *» 



l^il!^ """H** ™*led ana- 
2 we assume (o) the iiaenf. 

a label, the radioactivity of the ^le. W «Z2 
for 1 mm. Computations of ttaTSwtfaff SSJ 
reagent cpneentrationa (on which ^Satifns ^ 

able m such an assay is oyKb, where Z is the «ffi»*l 

m consequence 0 f counting radioactive S£l*TI 
finite time implies a loss of aaaay e^nsitiv^aTsh^ 
by the upper curve in Figure 6 (left). However*? 

antibodies withamnitii <iJMS 

toesTfl ^ ??* sample »untni 

times of 1-6 nun, httle improvement in LnsSvTtyfc 
gained hf usmg alternative labels of v lw i~ ™ " 
activities than ,3 »I HnZ!. , kgher B P eafi c 

The other main conclusions stemmin* fi™ ««u 
anal yaif are the importance cfbthSX^ 
ulabon" errors and using antibodiesTS: 
affinity. For example.TmSatt^ £SL£2?* 
an approximate threefold loss in t^ly 
landing the fact that an aaaay reoptiS fa 
to the deterioration in operator skillthaXS 
imply would utilize less antibody and , 
thereby partially ofisettingT^utcl ^ ^ 
Pipetting. But the « ? ^^'3SS^ 
from the analyais is the near impossibility in DraS* 

Z£$V a rrrz acts 

The results of a similar analysis of the aemriti*- 
imitations applying to noncompetitive^ \t%£EZ 

C of^ ^n n^' to the assum^ 

Uom of 1% and 0.01% nonspecific binding of labeled 

TJfm^^^y Such anal. 

a^y d J^ ?t th?^ CODduaionB releva °t ^> 
«"»y aesign, e.g., the crucial importance f redu«™ 



important conclusion! i7 *v rfth en«t 

methods, wM*fte»u^^ bMed 
activities of isotopes such £ % friw/J^ 8pedfic 
to senses of^^^^f^P^ce 

more. In short, although, under or 
noncompetitive iWSybV^^T, arcumst «'c 88| 
than corresponding RU^il ^ ^ "naitive 
of the sam/antib^y^ e^S? 1 ^ * e 
tial advantages (^1^^° P— 
rtive approach can be reahzed Z\l X noD »nipet. 
labels of much higheT^!^^ 8 aonisoto P ic 

ar* combined with h^^S? wh "> they 

Rgure 6 den^nstrateX^enw^^ h ° Wever ' 
with affinities of about ^ <>' antibodies 

»-y yield a substantial LZ^!;.™™^ 1 ^ 

These theoretical ccacSET 8ensiti **y- 

Kcation by K6hleT«dlS%? ,e f r * e **" 

vitro production of mmSS „ SjS aethod8 ofin 

tutedthebasisofmylab^S 'onsti- 

ment (initiated S meWi^™^ 6 dcve V 

ufacturer LK^WaCof ^ ^ 601 ^ 

immunoassay methodoWy t fluorome ^^ 

M Has me'thodoCwl 0 ^ fe^*"* <». 
n maotopiciinmunoaasayinetlkL i ^"•••Mitive" 
™e same basic ^SS^^^V^ 
adopted by n.anyoth^ufl^^^ ^ 

of hagh-apedfic activity lab* 8 Vanety 

defc^h^^ - ~ to the 
^ *™sa£ id^^^^ 



Tbe recognition that *11 fcJ! 

Potentially ^PortS ^^aaTS 3 ' !"* *« 
aamunoaasay US), m^^Sl^^ 
assay systems that, unliwL^ 7^ to de8crib « 
™* the analyte c^SSSS^^ 
an antibody is exposeTEZf - ^ medjum *o which 
Pie vol l une a nd7^ a ^^ depe ? de f t «** of san> 
Possibility of deve WTT? P«*«t The 

Law of M^ass ASot^e^^V ^ 
t>on, representine the SSi , lowing equa- 

- FftMAb]) + ([AaMAb]) + 1} + TAdH Ab) . q m 
vanoua uiJm. — —LJT^* : Btts 1,1 ">« Presence of 

body ^^ITrnT^J-^ ***** «*• • 

less than (say) 0 OMrS? ^centration of 

toanandyte^Sgm^uS t r^ rt, , b< ^ J 
bona!) occupancy of^Jw ?™\ the siting Cfrac- 



Ambient Analyte Immunoassay 

Particular attention has been Hr.— v 
specious notion that an anting, ^ above to the 
bating 0.5/JTis mJS^S^ST^ 0 8 PP™- 
conventional labe^^^™^ 16 activity of 
^Plicitly overturned by S*^^ 1 E? aiW « » 
"nmunoassays. which w Te^t^f^T**"*' 
a new generation of H*^^»g*g 

73 x 10* labeled fl>o/eculea 

™* per ^ 



Enzyme label 

Chemilumines^Bnt label 
fluorescent label 



fleets the S; ^^ 8it T ^ - 
^dependent of the tc^ ^?° , °i ^y^ 1 and is 
(If. for exlpleTjfTS t * ^ 

bindingHrite concentration of 0 01/JT ' ^ 
10-" mol/L, or 6.02 x ■ presents ° 0l x 

bidding by^tiboT JJi 8,te8/iaL -) Analyte 

Ivte in the med^^^becl^fr ° f (unbou ^) 
saall, the resulting reducti^^ 6 a f° Unt ^und i, 

conrentrationofbindin^S^iK exan, P J e. if the 
<0.01/JT, analyte deptoK » 
<», -d the ^ i.^^JJ-J 

^ ^* convention hu^nT^^.^lined to iaSf 
T»ey do not refer to tnoU, ~1 e ^ to deriving eooattZTj 

frwtional occupancy curZVzl* 1 (dunena '°«JeM) unit 
eal for o2/ antibodies tf tKb w«Tr ^ d °? *J«»«tioo 4 w» 




ioe term "ambient" ia nJj • ... 
pancy reflects the tally* L^?J ndlcale anUbody oca. 
?■ t« «« exposed, nXSSS "nbodXS 
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Antibody concentration *'* 
RQ. 7. Fractional antibody bfodlno-sha otturv-^ #r 

All oonosntraoorB art imnnd In un*. . 

and Bdfaon M W 30%h m with the pmcepts of 

dent of sample volume. 

These conclusions lead to two f t wv^ ± „ 
the antibody may be confin^ £ ^ZZ^ 
support, such that the 

site, within the nuc^a^^^^ 
~ the sample volume towhich th, ZXL * • 0 




-(WW- 2. l,:3l/ST 

n -23/N0. 4862209286 P 

maxiinuB, number of biodiiui sites th«f -,•« 

hgible disturbance (<l*T^i! ^ ^ MU8e »<*• 

Metric,- a ouaUabel, 
Dual-Label Microspot Immunoassay 

« 8, left). the proS S^^T ( «* Fi *" 
solution contain^ a remo / e ?. and «P°wd to a 

the -t^T^^ 

(Figure 8 tLT) Th. f*Z ^ molecule. 

different lab^Tg a £Z «l ^TV***" 
chemilu^ninescent tiw . Wh08 f fa t Ve ' en ™^ « 
differentna^uo^nlir ^ ° f 
Ocularly ^ i^S^S^^ 



rmau m (■« mm* I - - 
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* 0n ~compmtrtr*t tfsrwi^ torn 

RQ. 9. Basic principle of duaWabel ambtentTZ"? """" 
'•h** on fluorescent-labeled irtlboieV ,mmuno «say 

&^&'£tttt£Z2^ »• «*• of f <«• 

advantages stem from adopW a a 
measurement. For exampl "i£SL fluorescence 
distention of the -~ «£S^T?f Br 
field of view is important, iSSTS^ 
emitted fluorescent d SSS^? 1**° of 
tuations in the intenawfk ■ T* 4 Llkewise - Auc- 
beam are apTto ^HtSe^^ «■* 
tages are additional to ie baS^f ^ 686 a<W 
approach, i.e., tl^t tL „eL^ r *"» 
stancy of the amount o/^^Tf'?^ ensurin S con- 
assay system is reeved * the 

Microspot Immunoassay Sensitivity 

ap^chis^^^p^^^le b y thi » 
sition that nao^X^y b^at^ P ^ 
bve as conventional aystem^Xt lely oTLtT™" 
amounts of antibody may readSv J? ^ lw * er 
consideration of a model system I^f demonstra *^ by 
sensor antibody moSL^l *t L M P ° 8tul,lt * 

• solid suppoiri^^btdtS^ 8UAceof 
exposed to the analyte^dTT* ♦f^ g J" 63 remai » 
anaJyte is thereb^^Sa^^^^ for the 
tion in the system-S^W 
support divided by the ii^^ n ? 0 ■ ^ °* *• 
by such attachment, andantfESt^S^" 1 * unaffected 
at equilibrium ^ U^^LT^ * — 
antibody is distributed uniformly thZZFTS* * ** 
bat™ mirfure.) Let us tZS? 
molecules east* aj a u«if«— . tte 

aim) are unbbelei iCKS *5* 

the surface ana over Xk fl^K^ cb *"« ,! »» 
If. for eiainpk STuSSi. .« 'i 0 * " touted. 
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- - - 

a surface area of io» «, 0 . 
. antibody binding aitemZj^l }. ""^ "commodates 

trabon of 0.0I/JT etc ,»* jf rrM P ond » *> a coneen. 

ing analvte at * 10 a aednun contain- 

we ffi e^t * ^ «*• 6 * " 
sultm^tibody ^u^?r? mPetitiVel7 " ^ «* 
on* labeled, ^^JS^ST * * 
analyte, forming a tnill 7~£Z ™? ed &punBt 
* u* supper tit^^S? 7 
developing**' iL^l" 168 "«* with the : 
specificaV to fl?' Iattcr 8130 binding W 
density of 1 mole^eW. ^ 8t ***** 

We may now consider the _r 
reduction of the anSv^lT^? rf a P ro 8«aaive 
1 mm' (effective st^SS ^ 8Ulfaee area (e*) 

area is 2 99 x lrt'T T iecu f e8 specifically bound to the 
molec^« preUt) wK ^u 50 * ° fthe total'^ 

restricted to the arJTwhS g "^ument is 
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specrfcaffy^ labe led antibody within th* ins** 
inent's field of view), the signal/noise ratio observdfe 

mm» «ea is 9.02 x 10-, the number f labeleTanti! 
body molecules specifically bound to the area is 5.41 x 
the number nonspedficaDy bound is l(r\ and the 
bi^mw ratio is -64. Likewise, the signal/noise 

be shown to bH££ 
abort, the ngnal/noise ratio increases as the antibody, 
coated surface area is decreased, approaching a «£ 
mal (plateau) value of 60 as the area mM~*3iT 
antibody falls below 0.01 mmC^^ ^ 
If, however, a reduction in the antibody-coated lL 
were no/ accompanied by a corresponding reTucticH 
the detecting instrument's field of vieT^hT^ ' 
reduction in "signal" would not lead^'^Z^ 
decrease in the background Igentn^J iST^V 

crease the fraehonal occupancy of the sensor antibody 
fte »gna^o«e ratio might either remain constant 
these circumstances it might be advantageous to 
increase the coated area. Similarly ifih* j 
■* of sensor antibody were^LS 

and remained constant regardless rfti^lf^ ' 
field of *u «*anuess oi the mstrument's 

-*i Wow ,Mch CSSStf M 

reduce the antibody<oated surface «t» 0 ^j^^ 
tajly, the aensor^tibody cWtrS^T m 
although little advantage I likTtoTcnrS.^1^ 10 ' 
«• the area below oT n^Ti^Tth^S^ 
concentration below 0.0MO. IMQ ttM antibody 

: Were the microspot area indeed reduce , iV 

: I and noise would n*£S?S& 
ratio between them nevertheless remaining eale^tiaJW 

the hnut, be recorded In practice, other statistical 

events (e.g., photons) observed by a defer*;,.. iJ^T 
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^^^^^^ 

area. T*ua, given labels of veryluS 

° ne « e ™*» circumstances * wScTevef X 
"noncompetitive- system, the optimal concen^^ nf 
wnsor antibody may be «ceeSlow A™£ ? 
^conclusion is that a 

characteristics of the instruments used L m „ 

short, certain conclusions based or i exDerie^^RiT 
and ihma techniques may^ve Z^^tht^t 

petitSerm^t the0retiCal ^ 
SSir^ immunoassay sensitivity (21) sug- 

.x[(6xio»)(i + [AbW jjr Ab . J {fi) 



1?^? "J Burfa « d ^ty (binding siteeW) of BeDaop 

de^ oa bmit (znoleculea^nL). £.^^5? 

20 mole^e^then C * 2 .4 x 10 > molecuVe^mL 
e x io mol/L and the fractional occupancy oftiH 
binding sites of the sensor antibody bvth^L 
detectable concentration rfaSS^ooST 
show, the theoreticalaBBayseosiSa^b^ 1 

3P^o^ 

are essentially identical ^S,:^ 
venbonal competitive methodolomeri^ "f" 

iMaauyB an achievable, and (4) if labeli ofv.rTki.1 
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agn. ^ assays of conventional de- 

Ration occa. 

tenstics of micrespoT^y^ 6 -' . the ^etic chW 
«*arding this issue. Kth^T* flhouW * aai 
sewing antibody, the JoWr^^^ J«*"»P*5 
velocity of the Sib^^^?. Coi »^on 
at the limit (i.e, w^ 7 * 6 bmdla « ^on. so 
«t»«ted within the m^ZJart * "tibody 

Wics of the r^S^SZX Kn) 
homogeneous liquid-pha^e^t^ 0bs€rved in a 
f Active concenLti^X^' S? d> ^^"gh the 
faon medium is exceed^^^^ » *• in^ba. 

spot become occupied ia i«v«f 5 ^ ^ ^cro- 

Mbbody ib used, as in convener "^^on 0 f 
those of noncompetitive Jl, Particularly 
» mind the relabel J** -orda, bear^ 

antibody an?S^S/^ Ctional «*«P^cy 
above it i. «a^ de ^J^o^ nitio l SS M 3 
the ratio rises is greatest1vh»«7i. «t which 

^mentation whose fieldof v£w < ^ 
mjcrospot area, the highest £ to the 

concentration of sens^S,^ 00 ^) when the 
<fc°Mt Insert, ^ in "* ***** 

^'^^S^^ im- 
munoassay kcubatoon^J!^ ^efthat short 
amounts of antibody^ 



. Provides the basi. «f . 

^-rurrently availabS 

more 

■"""MP" •PPM*. *» feMibility of Se 

*•«-. of^^t 

1J* much s^^^^iliSiTaS* 
ated m a denned pC^f t 7 or ^*enatter.aiS. 
spontaneously eJt^\**^' >**ple. Election! 
«thode contribute to £ ^clf h0tom ^^J^ 

Slt3vit y-be minimized £Sj b f* m ^potaasayaea. 
^ents^er^ e F ^^tte design of aS 

" — - «* source 0^^^^ 
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to diminish with future improvement in photcmnltiplier 
design, Other sources of background include fluores- 
cence emitted by components in the optical system 
" 5. ^ not, in current instruments, have been' 
constructed with background reduction as a prime con- 
adoration. Nevertheless, they detect with high senotiv. 
rty fluorescent signals. For example, one commercially 
available microscope is claimed to detect fluorescein at a 
density of 10 molecules/Mm 8 . Most commercially avail. 
aMe fluorescein isothiocyanate (FITCJ-Iabeled IgG ex- 
hibits a fluorophor^rotein ratio of -4; this imolies 
detection limit CD^ far antibody surface dens^ of 
two or three FTTClabeled IgG molecules per SoLT 
Z' m tU ™' 8 theoretical sensitivity for a 

two-site unmunoassay of-JWJ x 10 s analyte molecule, 
per nuUibter. assuming identical parameter values m. 
above or W x 10- molecules**,* the Z^glX 
body has an affinity of 10- L^oi. dearly, senary 
may be increased by loading more fluoropbor %Z 
directly or indirectly onto the antibody. 

Our preliminary studies have relied on a less sensi- 
tive microscope, albeit one possessing facilities for dual- 
fiuorescence measurement Its argon laser emits two 
««tation lines at 488 and 514 nm. It is thus partS 
larfr ffioent » exciting blue/green^mittinff fluoro- 
3* Station maximum SLj Z 

uless efficient m exoting fluorophores such as Texas 
R^exatetion maximum 596 nm). However, the rati" 

ES^*™^ ""M-* varf/ti^ 

detection efficiencies of the two labels because the sne- 

^labeled antibody species fornSg 
the antibody couplets can be chosen to yield signal 
ratios approximating unity. Inefficiency of the aSon 
kser in exafang Texas Red is thus not a major handSo 
m this context Though this in^ent^e^ J 
convenuonal microscope and not on an opticaU^ 
deigned for this purpose (and thus imphcMyleas^T 
s^ve) it permits quantification of fluorescence signtJs 
gtmerated from microspots of any selected area. Initial 
rtuojes have revealed that under conditions that are 

^ P ^ htt i 1 f b r eDt "' Ca * ah]e of detecting -X 
nrC-labeled and (or) 150 Texas IW-labeled^G mole- 
cules per micrometer 8 , while scanning an areTof °50 
/an . 

rsJt* d + e S°P meDt rf ^crapot immunoaMavB has also 
neojBsitated closer scrutiny of the mechamamV involved 
in the coupling of antibodies to solid supports S Se 
pr^t cnntext these should display a^Lt ^ 
adsorb (in the form of a monolayer)-^ to^SenUy 
lu^-* high surface density of antibody combined with 

^^ aOC " M8n,,1 ; g ^ erBtil « P«P«ties (e.g., low in- 
trinsic fluorescence), thus minirmring background We 
have examined a number of candidate maS such 
as polypropylene, Teflon» cellulose and nitrocellulose 

aembranes, microliter plates (clear polystyrene plates- 
black, white, and clear polystyrene pUteswLss S 
and quart, optical fibers coated with S^n^py" 
tiieAoxy suane, etc, and several alternative protoSa 
for achieving high monolsyer coating densities. These 



densities of functional antibodies (-5 xlSwf^T* 
cules/^Vfor assay deve^lnj aSou£ ^ 

SaSSf 1 JS' defid€Dde8 * tie aTtiboS 

deposition methods used constitute the principal sour™ 

of inroon in assay results ^ 

sensitivity that this implies. Clearly, this re^ e «S £ 

a^or^rstudyandreflnementof^ 

r^^/T^ Stations of present instru- 

the use of time-reaolvmg techniques to distinguish two 
^dmdual fluoresce signals either IromSStaE 
from background fluorescence) and the crudeaesT f 
present methods for coupling B^tibodiesTnto small 
areas, we hsve verified the theoretical concepte ouS 
above by comparing the performance of J£ 

^SST^i fo ™at and when con^n? 

tionally designed. Although unoptimized, ratiometoE 
microspot assays have yielded senStyTaluS cTcSS 
approaching ftose of conventional optimized WU^Z 
anexamp e , the result, of a r^ometric^s^ £ 
thynrtro piIl) with use of Texas Red- and F^labalS 
antibodies, are shown in Figure 13. Bearing inmidlhl 

7^™^™ of and o^ctn^tion! 
al fluorophors when used aa immunoaaaay reasent 
labels such results are encouraging, dtS fiX£ 
work* clearly required to T^i^L 

Thefinding that highly sensitive immunoassavs ea» 
be performed with far smaller amounts oSSy tZ 



c 




SoJd.phii* A6 coak»d lor tOr^-. 



1 '0 100 

TSH concentrall n (mu/L) 



1000 



Rq. 13. Response curve In a duaWabeled mlcfosnof i^nm*** 
FITC-Bvidin Developing antibody labeleo- with OJooV 
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are currently used conventionally permit* in turn the 
construction of antibody aicroapot arrays enabling, in 
principle, the simultaneous measurement of thousand* 
of -different substances in 1-mL samples. In eolkbora- 
tic* with investigators at the Centre for Applied Micro- 
biological Research, Porton Down, UJC., we are pres- 
ently developing various techniques for the creation of 
auch arraya. Indeed, similar technologies have recently 
been used for the parallel synthesis of several different 
polypeptides, these enabling 10 000-microspot arrays to 
be constructed on silica chips approximating 1 cm 2 (24) 
Although arrays of this capacity are unlikely to ever be 
required for conventional diagnostic purposes we can 
anticipate that the ability to simultaneously measure 
many substances in the same sample will have revolu- 
tionary consequences in medicine and other similar 
areas. In addition, such techniques may ultimately 
permit the individual analysis of the multiple isoforms 
of certain heterogeneous- analytes (e.g., the glycop™ 
tern hormones), such molecular heterogeneity currently 
presenting a major obstacle to the standardization and 
interpretation of many immunological measurements 
(25). Moreover, although these concepts have been illus- 
trated in an immunoassay context, they are clearly 
applicable to all binding assays," deluding those relv 
mg on the use of DNA probes., hormone receptors etc 
For example labeled lectins that are specifi? ? *th£ 
reactions with the sugar residues in the oligosaccharide 
chains of glycoprotein molecules may be used together 
with specific antibodies, to impart aditionaFstrSal 
specificity" to sandwich assays (26, 27), possibly over- 
coming tte baitetoons of antibodies per se in regard to 
differentiation of the glycosylate variants of the glv- 
coprotein hormones. 

Summary and Conclusion 

Because of past confusion regarding the concept* of 

S?\ SenS, S V,ty ' aCCUrSCy ' etc - 8ever ^ erroneous 
concepts have become incorporated within currently 
accepted rules of immunoassay design. In particular 
much higher antibody concentrations are customarily' 
used than are necessary to achieve very high assay 
sensitive provided that certain measurement strate- 
gies are adhered to. In this presentation, we havt 
attempted to show that, in principle, the highest 
sens.tmt,es are obtained by confining a small numbS 
of sensor antibody molecules onto a very small area * 
*e form or a macrospot and measuring their occupancy 
by an analyte by using very high-spedfic-activity "dZ 
ve loping" antibody probes, thereby maxSuzing the s£ 
nal/noise ratio in the determination of sensor antibody 
occupancy. This observation, which contradicts er- 
rantly accepted immunoassay design theory, in turn 
makes possible the measurement of an unlimited num- 
ber of different analytes on a chip of very small surface 
area through the use of, e.g., laser scanning techniques 
closely analogous to those used in compact disk tech! 
raques of sourid recording. Extensive experimental stud- 
es in tins area albeit conducted with relatively crude 
techniques and instrumentation not specifically oV 
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„ , .„ Corrections 
Vol 37, pp. 1447-H: In our desire for rapid publication 
unportant errors were introduced into the following 
Tec^caJ Brief. The corrected version is here r£r? 
duced m its entirety, with our apologies to the authors! 

Rapid Detection of 1717-1 G-.A Mutation in CFTR e. M 
by_ KIHMMd Sfte-Dlrected Mutagenei.s 

C/rmonew ' Manuda Sew * Caanelina Magnam,' and 
Mauraw Ferrari 1 (> Istituto Scientifico H.S. Raffaele 
Lab. Centrele, Milano; 2 Istituti Clin, di Perfezionamento 
Lab. di Ricerche Clin., Milano, Italy) ' 




nJ^S?°ZL aa ^ at th * noa -^08 mutations identified in 
^tfyf* fibrosis transmembrane conductance reeuW 
<CFTR)gene by the Cystic Fibrosis (CF) oJLTSSZ 
Consortium, the ones mc«t frequently seen in our popula- 
toon sample are the 1717-1G-A mutation (13/144 oVSfrf 

L f t°rr' aDd *** G542X mut «tion (167190 or 
8.4% of the CF chromosomes), both revealed by dot-blot 
hybnduatSoB of the polymerase chain reaction (PCR) prod, 
uct with aUele-specinc oligonucleotides (ASO) probes m 
la an attempt to simplify the analysis of thT most 

labeled ASO detection into restriction endonudease anal- 
ysis of the amplified product. 

JiS: R rZiir BU t' A ^ eCtei mut ^ enesi « 12. 3) to de- 
tect the G542X mutation by generating a novel BstNl site 
m the wild-type sequence had already been suggested (4) 
To detect the J717-1G-A mutation, we dSgned the 
reverse primer (5-CTCTGCAAACTrGGAGA(XITC>3') to 
contain a single-base mismatch (T-G), which "could create 
*JZ re ^ 1 °° «te [G i G(AmCC) in the am! 

pUfied wild-type (WT) allele but not in the CF mutant (M) 



fig, 1. Detection of the 1717-iG-A mutation by PCR 

triphowhgteT ul« «TtJT!^" "* rt "* lBur oeoxyrtbonucleoSde 
T. " no ; 00 P™ 1 « each of the primers. PCR condltkam »■ « fcCw^T. 

37X e«h 5 U Ol-Av,. elSSSorS S^oT^Wcu 2 " *f 
homor««eTe4^o:^SS^^ R322 ^ normal 



WT: WT 1717 
I- 



5' 



TAGGACA GCAGAG 

..CGTCTC 



3* 



( AT XTGG. 



AuxD site 



M: Af 1717 



•5' 



3' 



3* 



TAAGACA GCAGAG 

ATTCTGG CGTCTC 

•. mutasenized base f rev rse primer 



5' 



k a ^ 5"™? P"™". w e used the one made available 
r *u £%£ hc AnaJ y sifi Consortium to amplify exon 11 
AG-fcT^ geDC: 5 '^^^^AAAGCAAT- 
Digeetion by Avail enzyme of the PCR product generates 
two fragments of 116- and 21 -bp in the wild-tyje 3el« 

SdesTF^?^ a 137 " bp iD ^ 

By combined analysis for the AF508 mutation (5) (252/ 
470 or 53.6% of the CF chromosomes), 1717-1G-+A and 
G642X, about 71% of mutations might be detected by 
nomotopic analysis of the PCR product, thus allowing a 
faster and easier one-day procedure for carrier acreenmjr 
and prenatal testing. 6 
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A method for determining the ambient concentrations 
of a plurality of anaJytes in a liquid sample of volume V 
mers, comprises 

loading a plurality of different binding agents, each 
bang capable of binding specifically and reversibly 
an analyte of .merest onto a support means at a plural 
•ty of spaced apart locations such that not more than 

*L£ u ^ Mndi, « a « enl « P f ««t at 
any location, where k liters/mole is the equilibrium 
constant of each such binding agent 
contracting the loaded support means with the sample 
to be analyzed, such that each of the spaced apart 
locations is contacted in the same operation with to 
«=ple. the amount of sample liquid being such that 
only an insignificant proportion of any analyte pres- 

^SET" *~* t0 *< bindin * 

measuring a parameter representative of the fractional 
occupancy by the analytes of the binding agents at 

TL'??? SPan l0Cati0m b * • competitive or non- 
competitive assay technique, using . Ubelled site- 
recognition reagent for each binding agent capable of 

fiHK 8 ei,hW the Unf,Ued «'« or the 

fflled bmding s>tes on the binding agent, which ena- 
bles the amount of said reagent in the particular loca- 
tion wbe measured. A device and kit for use in the 
method are also provided. 

17 Claims, l Drawing Sheet 
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DETERMINATION OF AMBIENT which " * monodoMl "tibody miy, for example, have 

CONCENTATJ ON OF SE^aTStES IJ.T^TX SLS* " * 5 

ID" liters/mole for the specific antigen to which it 

This is a continuation of co-pending application Ser s ZT k-T" **** ^ ab °? S" 6 ™^ "ccepted prac 
No. 07/4*0,878. filed as PCT/GB88/00649. Aug 5 n ! (ornte) the order 

1988. * ug " 3 ' of 10 " molester or more is required for binding 

FIELD OF THE INVENTION ° f f ch M equilibrium «>»«aat and, with Ooid 

ru^u wr j h* INVENTION sample volumes of the order of 1 milliliter, the use of 

The present invention relates to the determination of Ifl I 0 " 14 or more mol e of binding agent (or site) is conven- 
amtnent analyte concenoations in liquids, for example '° <*«««1 necessary. Avog\dro-» number kSSi 

^determination of analytes such as hormones, prote- 6xl0«sothat 10- >< mole of binding site is equivakm 
ms and other wturaHy occurring or artificially present 10 ™« than 10* molecules of binding agen7even«l 
snbsunces m biological bqwds such as body fluids. suming that the binding agent pcWTnvo bmdmt 

BACKGROUND OF THE INVENTION 15 f™ 1 ^ F ?/ ***** binding agents of the 

WO84/01031 to measure the concentration of an ana- cules of binding agent, wfef TbS™i^lt 
lyte m .fluid by contacting the fluid with a trace lower affinityof ihe'Se^ f!^^S&i^J^ 
amount of a binding agent such as an antibody specific ,„ the use of more than 1$ 'molecrtes ™ ^ 
^£^ e « "vcrsibly binds the 20 practice. In facS m^S ^fi 

analyte but not other components of the fluid, detennin- merciallv at the meJr^Z^Tf marketed com- 
mg a quantity representative of the proportional oceu T^TZ J prescnttime conform to these concepts 
pancy of binding sites on the bmdinV^e^d ™ f "T^" ° fblndjn 8 » te approximating toor, 

mg from that quantity the ^SSSSSl KS^£ JS^S * "T ° fV/ *^ 
application I point out that, provided that the amount of 25 ^ M thc of ^belled 

bmdmg agent is sufficiently low that its introduction , conventional to use as much binding 

into the fluid causes no 4grfliJt*nS»Sf rf me P °^' e ' bmdin E Proportions of analyu gre^ 

concentration of ambient (unbound) analyte, the frac n 8 v . 

tional occupancy of the binding sites on the bindine „ , £ nd, * g of snbstan t>al proportions, for 

agent by the analyte is effectively independent of the 30 example 50 *' of fte » the liquid samples under 

absolute volume of the fluid and of the absolute amount J*,." 1 SUCh s y s,ems - fractional occupancy of the 
of binding agent, i,e. independent within the limits of , * s,,es of ^ bmdin £ *8*« is not independent of 
error usually associated with the measurement of frac- volume of the ""id sample so that for accurate 

tional occupancy. In such circumstances, and in these \ uantl « nv = assays it is necessary to control accurately 
circumstances only, the initial concentration [HI of 35 ttevolttni e of the sample, keeping it constant in all testt 
analyte m the fluid is related to the fraction (Ab/Ab„) of T ° f Ae of unknow " concentration or of 

binding sites on the binding agent occupied by the ana- standard samples of known concentration used to 

lyte by the equation: generate the dose response curve. Furthermore such 

M*H-K4Wi+XM IPT* ^ require careful contro1 of amount of 
^" 1 40 binding agent present in the standard and control incu- 

where Kat (hereinafter referred to as K) is the equilib- ba,10n tubes " limitations of present techniques are 

num constant for the binding of the analyte to the bind- universally recognised and accepted, 
ing ates and is a constant for a given analyte and bind- ^ Pa,ent Application 2.099.578A discloses a device 

ing ■gent at any one temperature. This constant is gen- for immunoassays comprising a porous solid support to 

erally known as the affinity constant, especially when « which antigens, or less frequently immunoelobutas. are 

the bmdingagent is an antibody, for example a mono- bound * 2 plurality of spaced apart locations, said de- 

TtiH ?'r ■ . vice permitting a large number of qualitative or quanti- 

xne concept of using only a trace amount of binding ut,ve immunoassays to be performed on the same tm>. 

<E a r U i?*?? y l ° 8enCnJly recommended practice in P°rt, for example to establish an antibody profile of a 

SLf i~ ! mn ? 0 * SSly " d immunometric tech- 50 sample of human blood serum. However, althougb ie 

mqoes. For examp e, m such a weD-known work as ^dividual locations may be in the form of soSfied 

og^e? s m A 5SSS 'm?yr k \Sf eM microdo,s prc f uced by $upp,yin * rftSS 

m \uiu- BerKm J aad R - S - Y'Jow, 1973 at pages containing solutions or suspensions, the number of 
111-116 it u proposed that in the performance of a moles of antigen present at each location is ao^Ltlv 
SI TSiZSST'* """^^viry of the 55 still envisaged as being enough to SSBfifl 
assay ■ achieved if the proportion of the "tracer" ana- the analyte (e.g. antibody) whose conJ^vT;* " 5 
lyte that ss bound approximates to 50%. In order to measureVthat is preS «2S3 S£ 

^oft^l^r ° f ° f thC ,he ^ is a PP«»^rom the fac« q S,^qiLS 

theory of Benoo and Yalow, to this day generally ac- method used in that application (naee 3 U^f 2? ™ 
cepted by other workers in the field, requires that the <0 involves calibration wKown Sounti of^J 

PSS5ESSS55: 

be greater than or equal to the reoorocal of the , "! extracted from the sample in order for a 

riui constant (K) of the rt££?Sto£22£ 65 .^"T 500 ,0 , ^ m,de Md h "« that large 
ie. I.b]>l/K. For a aampl?^ voTume V thTSS S i» this situl 

amount f binding .gen, (oV binding , K) 5? o^S SSI? ^ r ^ ^ b Mce » 01 

fore be greater than or equal to V/K. A iS g a^t £, y£S?j£$Z ^ » thi$ 



SUMMARY OF THE INVENTION 
^ £™ Bent nv «oon involves the realisation that the 
we of high quantities of binding agent is neither neces! 
«ary for good senstrvity in immunoassays nor is it gen- 
erally desirable. If, instead of being kept a, UrJ „ 
possible, the amount of binding agent is reduced so that 
only an insignificant proportion of the analyte a revert 

? y ^ Tr lU gCDCraUy less 10%, usually 1« 
than J% and for optimum results only 1 or 2% or less, 
not only is „ no longer necessary to use an accurately 
controlled, constant volimw f™- «n .u. - . . 7 



_^.~.-_r " " — UK ™«J io use an accurately 
controlled, constant volume for all the liquid samples 
(standard solutions and unknown samples) in a given 
assay, but it is also possible to obtain reliable and some- 
tunes even unproved estimates of analyte concentration 
using much less than V/K moles of binding agent bbd 
ing sites, say not more than 0.1 V/K and preferably teL 

001 V/K -^" « ***** havmg 
num constant (K) for the analyte of the order ofTon 
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* markOT CT,blin « »e concentration level. 
a^EE™ ™*T r » the cUffexeatinS 
Such measurements may be performed cotuecutiveW 
for examp e usmg a laser which scans acWSeS 
port, or Simultaneously, for example u^a pho£ 
g^c plate. depending on the nature of the EST 
OAerunagmg dev«es such as a television camera o£ 
also be used where appropriate. Because the bindm* 
10 agents are spatially separate from one an^er it S3 
We to use only a small number of different marker KL 
or even the same m.rt.» i-w-. . v UDe " 



. — ' - «"«ii»er oi uiierent marker labels 
° r ™« niarker label throughou^So Jean 

^achbmcUng ^t location separate!^ det^unVS 
presence and concentration of the label. ByusTof thr 
15 »venuon considerably more than 3 anaTy^cant 

with liquid to be analysed, for example 10, 20 30 50nr 
even up to 100 or several hundreds of *iyses ' 50 " 



.. — — «« ujc anaiyte or the order of 10' ' rv, .J V — " uu " urcas oi analyses, 
.ten/mo e and samples of approximately 1 ml size this 20 meSSfa^ ^ preSenI mventioD Provides a 

ts approximately equivalent to not more than 10« pref 52£ ° f da ™»« ^ambient concentrations of a 
erably less than 10'. molecules of binding agent a each H m 3 ,iquid «»* of volume V 
location*, an individual array. If the value of K "IS ^coinpnsmg: ume 
hters/mole the figures are Wand lPmolecules respec- 

Eh* ^f 0 K ° rder of J 0 " "ten/mole the?afe 25 
10H ^d 0 w molecules respectively. Below 10* mo£ 
cute of binding agent at a sinou i. — .v ~" c 



30 



35 



. r - w lopcenvay. seiow io 2 mole- 
cute of binding agent at a single location the accuracy 
of the measurement would become progressively lessls 
the fractional occupancy of the binding agent shefbv 
the analyte would be able to change LT* discrete 
steps as individual site become occupied o? 2S 
pied, but m pnnaple at least the use of as low aTTo 
molecules would be permissible if an estimate wiS an 
«*«raey of 10% is acceptable. PracucTSSo" 
maygwe rue to a preference for more than 10* mo£ 

It will be appreciated that the abovementioned GB 
patent application 2.099.578A. which for qSuunVe 
estimation relies on large amounts of bmdmJ^enT 
essenually total sequestration of all analyle. fafl, w 40 
recognise the advance achieved by the present ^ 
oon. which instead relies on a different ana^cal P X 
aple requiring measurement of the fractional occu- 
pancy of the bindmg agent and . which thus requires only 
a very low proportion of the total analytTmolecul« « 
present to be sequestered from the samplk ' 

am n°™«Tf 8 K^ rCCOgniti0n ^ "» of such small 
amounu of binding agent is permissible, it becom« 
fusible to place the binding agLTr^uiiS for^S 
concentrauon measurement on a very small Jn 0 f l J0 
solid support and hence to place m jJtaposibo^To one 

s»^ *! K ^ onTsin^le soUd 
support « wide variety of different binding agent, «£. 
afic for different analytes which are or may Kres^t 
simultaneouslv in a lionul t« i i . Preseni 



. — , * ««uyies wmcn are or may be present in» . ,7, "I ^ ^v<= oi volume v bters, compris 

simultaneously in a liquid to be analj^d. sLul^»™ 55 f a 1 sohd / u PPon nieans having located thereon an 
«pc*ure of each of the separate po^ t0 Z Z£? o " ° f ^ *P™ locations a plurality of*^' 

be analysed win cause each h^L \'f uc to enI »»"d«n g agents, each binding agent bang capable of 

reversibly binding an analvt. „k ; , 



k. - T J *^>^»ic pomu to the liquid to 

be analysed wul cause each binding agent spot » uk 
up the analyte for which h is specific to anextent S 
fracuonal bmdmg site occupancy) representttive of Se' 
analyte concentration in the liquid, ^ided oriy that 
the volume of solution and the analyte concentratiTn 
theron are large enough that only an^S^TS 
tion (generally less than 10%. usually lesHhanS^ Tf 
the analyte is bound to the point The fraction!? binding 
ate ^cupttcy for each binding agent can then be deta? 
mined usmg separate site-recognition reagents which 

mTshTJ^ ^ bindin * *~ oSd bind- 
ing sites of the different binding agents and which are 



loading a plurality of different binding agents, each 
being capable of reversibly bindmg L anaTyS 

,23 06 Present » «he liquid «d£ 

specific for that analyte as compared to the oUiet 
components of the liquid samplVonto « Tsn™ 
means a, a plurality of spaced fpartTo^.ons^ 
tba each location has not more than 0.1 V/TC 

f.tL, f e< ' u V ibna,n c on«ant of the binding 
agent for the analyte, 6 

contacting the loaded support means with the liquid 
sample to be analysed such that each of the spE 
apart locauons is contacted in the same operation 
Z ~ 1 1 ,(3u 1 ^. saB1 P le ' amount of liquidised in 
the sample being such that only an bsigiutoant 
p^poruon of any analyte present ^S^S 

cSort^ bOUnd ,0 *" ***** ^ V 
measuring a parameter representative of the frac- 

lZ« 0 ^K PanCy by 016 of the binding 

agents at the spaced apart locations by a compel 
nve or non-competitive assay technique using a 
MU-recognmon reagent for each binding agent 
capable of recognising either the unfilled bindinc 
s«« or the fdled binding sites on the bmdTug a g «? 
saufsite-recognition reagent being labelled with a 
marker enabling the amount of said reag«t^ 
particular location to be measured 
The invention also provides a device for use in deter- 
= the ambient concentrations of a ^Jrf 
analjaes in a hquid sample of volume V liters, comb- 
ing a solid support means having located thereon aVa 
plurality of SDaced anart l«o..;™T, . _,..„,=" ™ " 



60 



65 



i*v m -ki3 \~ I wnumg agent txang capable of 

reversibly binding an analyte which is or may be pres- 
ent u, the liquid sample and is specific for thai analyse as 
S^.' 0 ? C 0ther com Po»«ts of the hquiSp" 
each location having not more than 0.1 V/V pre Slv 
less than 0.01 V/K. moles of a single buTdbg a« n I 

^Sc 8 "" rCaCli0n With the 10 w »*** 

A kit for use in the method according to the invention 

tiLTifT Wj , Sa0,pl u oon^S known concentra- 
uons of the analytes whose concentrations in the liquid 
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simple are to be measured and . set of labeUed site- means to be such that liquid samnle, f .™m»»„ j« 

recognition reagents for reaction with filled or unfilled the volume V liter* are S^Kc/i^^J^ 

binding sites on the binding .gent,. the plurality of spaced ^iJS^Zt^^t 

In arriving at the method of the invention, I have different bindingVgenu * For e^£ I™ ^ 
found that, generally spewing, for antfcodies having an 5 locations noa/be^ar^ 

affinity constant X liters/mole for an antigen, the rela- means, and a plurality of welK llf^ 'ff 

S^cf^^ 01 ^oup of S&fbmd^^K^pS; 

fractional occupancy of the bmdmg sites at any pamcu- locations, can be linked together tTfo™ TmW«£~ 

tar antogen concentration and «e relationship between plate for use withM^ of^pleT 

the inobody concentration and the pcrccntacc of anti- in u/w r _/ r U1 ™P J «- 

gen bound £ the binding sites at any^aSx t£L wi^a measu^^T? « 

concentration follow the same curv« provided ffi5 mTeriJ TT e nksti^ £ ^ * 

antibody concentrations and the antigen concentration! o^T'to iV^f ' , " PP ° n U **** 

15 bIack - $ «ch as carbon black, when the signals to be 
BRIEF DESCRIPTION OF THE DRAWINGS measured from the binding agent or the site-recognition 
The principle underlying the method of the invention cTnfmarTen V^a^T^V 1 "^ 
may be better understood by reference to the accompa- fcSdfS ^^^Z^^onl'Z 
nymgdrawmg which is a graph representing two sets of 20 detecting instrument w rtott^L rfS £ r i 
curve, plotting the relationship between antibody con- choice of option TL^S^S^t^^ 
cen.rat.on and the fractional occupancy of the binding attach the binding agendo «f S te rtS? rf 

r ri"oS p ^frur y ~™r n s s background ^iSir^ 

terms of 1/K. plottj along the TJ£. F^SH? of b3owbv.^« 0 f . " 

curves which remain constant or deefce withincfeal Sri,™ of 8 ?P"J™ polystyrene microli- 
inglAb], the y^xis represents fctaSLlS^SS 30 £ Sade ~ SS^^n^ 
(F) of binding sites on the antibody by the antigen- for TvTLhS =17 , Microfluor microliter wells, 
the second set. the y-aais represents £e pwc«uee difl?4J? u"^ my * bindin * ° f 

. dependent on [An]. f Q ^ Jj^s?S^~^ £££ 

DETAILED DESCRIPTION 40 considerable ranges, for example 2 or 3 orders of magni- 

The choice of a solid support is a matter to be left to n-™™,*^ HCG °>«^«rement in urine of 

the user. Preferably the is Tso^at S/rf ^ ^ " ^ ^ ^ °" 1 <° 100 ° r 

like the sucDortt used in GB 2 099 578a C/W», " ■"imiyconsiants oi 10'*-10» liters/mole can also be 
w«1tn £ ™™< !^.v^ v. Uicd - The wvention can be used with sucb binding 

t^^^t ^ 0f mole ' whi ch are not themselves labelled. Howev« ™ £ 

The support means may compnse microbeads. e.g. of two signals and thus eliminates the need £ !2e£ 

such a plasucs ma.erul. which can be coated with uni- same amount of labelled bmdbg agenttn^e Vu^dS 

form layers f bmdmg agent and retained in specified when measuring signals from staidtd sa^nte foS 
o-oc^e^g.ho^ws.on.supportpUte.Alurn.tively 65 bration purposes afwhen^e^g S t from^fe 

the material may be m the f rm fa sheet or plate wWch unknown samples. Because thT wtL o>tLd?«w! 

b spotted with an array of dot, of binding agent. It can on m e n ma ^ 0 titZ^S^SSiS& 

be advantageous for the configur«ion of the support occupancy, there is also no^eeTS Tmelsu* 
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from the entire ipot but icaaaiae only a DonioB « tv • ^ 

dent^bindmg.gentBprefer.blJbblKX JZjTZZ^ ««» * the method 

sane label but different labels can be used according to the mvennon may themselves be anlboo- 

Tie binding agents may be applied to the mmm ;„ ^ 8 ' monod ° nsJ antibodies, and may be ano-idioty- 

anyofthewayskaowno/ccmvffio^y^K? s £L° ■ T^JT ^ 

ing binding agents onto supports ,ucb « ^!" P ^ ^ f^vely, for couple for aSytef 

ample by contacting each^ed .paction on " ^J"^ SUCh * CT4), unoS 

support with a solution of the bindinTacent in Aefo™ f,l!?r ^ r ^ co « nued <*">g either the analyt* 

of a small drop, e.g. 0.5 microlitovo? , ?Sn" spot a£ 1^"^^ v"*"* 0r ^ eovalentjy 

allowing them to remain in contact for , ^ Jf?r. coupled to another molecule-ta a nmtrin 
before w«hing ^ 0x0^.^°^^ " ffSSfi* " liffK 

small fraction of the binding agent preserTin thedron ^ ^1° gHm °* labeUed directly or 

becomes adsorbed onto ^„5p 0 „ P « ries^of ^ tSSSSS^ ""S"" 0 " bbds ~* " 

procedure. It is to be noted that the coatine derJritv^f P u ° resce, a.fhodamuieor Texas Red or materials usable 

b^gagentonmer^crc^tdoJn^a^el is IS^TS «* aYeto^I 

than the coa«mg density in conventional aXdJ " £? o^X^^^ * man- 
coated tubes; the reduction in the number of molecde. SiJ 7 . ^ ?= h 84 chemuuminescem, enzyme or 
oneachspotn»ybe.cWevedsolelyby^uc"o7o?S ^Z2 0 ^ ,My be * 
size of the spot rather than the coating density A hiek «^!?2!f? 0n '!? geot B Preferably labelled with the 
coating density is generally desirables 20 wmT^J?™ ^ can be used in differed 

nal/noise ratios. The sizes of the spots are advantf- ™"««>gnit»0n reagents may be specific 

geously less than 10 mm* preferably less ttal ^3 each \?Z T »8«"«->yte apots in 

The separation is desirably, but not necessarily, 2 ™ 3 d v™? P ^ " " circumstances, as with 

t.mes the radius of the spot, or more. These suggested T?* mch 88 HCG and FSH which 

geometries can nevertheless be chanced as r«mir«i ■>< common binding site, they may be cross-reacrinir 

being subject solely to the **JZtt££gZ " SS^LV?* « ^dmg TtcT^ 
binding agent molecules in each sdol the mi one °' the spots. 

volume of the sample to which the array of spo^wX the T «Presentative of 

exposed and the means locally available for conve! ^^°/^^ Cy ^^^^lin^t 
Preparing an array of spots in the manner de- 30 ^^^ZVj^^^ 

Once the binding agents have been coated onto the IT*"* ^P 1 " containing known con- 
support it is conventional practice to wash the sujpoa f^' Suc h«£dard samples 
m the case of antibodies as binding agents, with " foU? ^ T ? nlaiD ^ * e »«cther. provided that 
tion containing albumen or other protein to satur*,, .11' ,< „_ , ™ » P 1 **"" "> some of the standard 
remaining nonspecific adsorption sites on Ifc ^J'^^ may be measured by 
and elsewhere. To confirm that the amount of KX * 8 . ^T"* bmding a,cs (« with an anu- 
agent in an individual spot will be less than Us?2? ™alyte anubody) or unoccupied binding sites (as with 
mum_ amount (0.1 V/K) required to confoim Troth, ™ f. n -' a,ot yP>c anubody). as one is the converse of the 
prmcple of the present invention, the amount of bind! 40 S fraSn^'? " b desirabIe to 
mg agent present on any individual site can be chJewln f ,°" whjch 15 closcr 10 because a chance in 
by labelling «, : binding agent with a oet^aWe^to JS^SaT^ °/ °V 01 * ™°™™*7£££ 
of known specific activity fue. known amount of nwto 1? ^^I** 1 f ° r fracti o»«l occupancies i„ the 
per urn, weight of binding agent) and measuSfS T* alteraative » «cneraDy satisfac! 
amount of marker present Thus, if thm 1 u T lOTy ' 

bmder 1, « ot ^ on me^d^^ ^ ^ rel!esrt: mb n 0diIDC,I, ° f ^ P — **— which 

method of the invention the binding .gem cannevSthe T !w " tW ° nuorescent ^ers, the measurement of 

less be labeDed in a trial experiment. KSSdSS one on e ,K U " I ! nS i ty ° f signals from °« two maSert, 

uons to those found in that trial to give rise w c™ • he bwdjng agco, 800 the other on the site reco£ 

toadings of bindiag agent can be £ I 50 confo^ 6 " 1, be Carried out «* * '«Sg 

beUed bmduig igent to the supports to be acruaTly uS. MRclio a C vS P % SUCh » aS 8 Bi °- Rad 

The minimum size of the liquid sample (V lhenlh ■ iv ^ b]e fl0a Bl o-^d Laboratories Ltd 

corrected with the number of mole of Wndms ^ "^'T* " CbaMel de,ec,ion s y s ««»- This intxu 

Oess than 0.1 V/K) so that only an insigr^cSrofo? ^2/?" °" * bcaa> to «» ^ *«• «^ the Se 

bound to the bmding agent This proportion is 8 elengt ^ / dters «° distinguish and measure the 

general rule less than 1(W6, usually le^harT 5 % »d 0UaU ° f " uorescen « «ni"ed. Time-resolv^ fluo- 
derirably 1 or 2% or less, dependmgTn S^acct^ methods ™»V also be used. Interference 

desired I for the assay (greater^ccS, Z£l oSed ^^T"^ betWeen the ^° ^ndfc^ ^ 
other things being equal, when smalTer pT^poStf « f^!? °\ ^ SlMdard co «ctions if i, occurs or 
l-ialyje are bound) and the magnitude oS?S£ " SZTSL'T"*?* * l ° reduce il " 
mtroducmg facton present Sample sizes of the order" ^ L,n^ ° . ftuorescent si S^ emitted by the 
one or a few ml or less, e.g. down to 100 nS„„ SS, £S ^ u a « on »P^ted in the present form 
less, are often preferred, but circurnstances^y^! m$mi,Dent - by Uters capable of distinguishing 

when larger volumes are more convenje^ly^S 6J ^™?" SUC wavelength of the two fluorescent 
and the geometry may be adjusted accordin^T^ f™" 10 "; however, fluorescent substances may be db 
sample^naybeusedatitsnatuilc JSSSSfc*", SS?** physical ^"eristics such I 

if desired it may be diluted to a known extent Jf^ """^ce decay times, bleaching times, 

etc., and any of these means may be used, either aJoww 
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in combination, t differentiate between two flnnr«. 10 

phores and hence pern^aeasureaenrf^ntioTf "! CT0 * rami/mJ » ' ™ed andOS tniooBteralianotirf 
tw fluorescent labelled entities (binding aE £, °J? to J 0 *** art added in the forTcY £SeS2 £ 

using techniques well known in the fluorescence mea < P 5 , . 

surement field. When only e^flwJSKw . After of the droplets the plate i, left for . few 

ent fce same technique, nay be used, provided^,^ J™"? " * hu ™ d »«»«phere to prevent el^tiLrf 
u taken to scan lie entire spot in each ca* an" ^ *°P ,e ?- during this time son* oH^Sodv 

comam essentially the same amount of binding 2 m k ,he ^P 1 ™ become adsorbed 

&omone «ay to the n«t when the unknown a?d io ^ " e severanSuTwItK 

standard samples are used. 0 10 Phosphate buffer and then thev are fin^ ^^TT. . * 

In the case of other labels, such as radioisotopic la- J*"*"- 
beh, chemflummescen, Ubds or enzyme labels, St S™ ? SatU T" te the reri ««ual binding siteTm *73? 

?swaKaa » ~T *— — **— « 

EM5=3*£? beTnTegi= 20 ^^^^IS^^ 
°T "*» chernihmunescenT "53£ * S"3 of ^lum? £ S£, B I JS? 

drfTerent chen^uminescent lifetime or wavdengft of V f »* X,0 ~ M moles <e*nvE^ x ?oH£! 
hght emjssion, by techniques well known in th™reW , Cules) f or the anti-TNF antibod/and 7 V ft)- M ™ , 

The invents may be used for the assaying of ana Mtibody ' ^ fW 41,6 «"-HCG 

lytes present in biological fluids, for V 

body fluid, such as b!o^ ^J^^ uSe EXAMPLE 2 

may be used for the assaying of a wide variety of hor- , A "•"oliter plate prepared as describe ;„ c„ , 
mones. proteins, enzymes or other analytes which «e 30 used in an assay for nanifa^^Z^f' 

either present naturally in the liouid «m»u ~ Y 30 containine TNF *nrt Hrv- a . y P* 00 " 6 * 1 solution 

present artificially such « dn^isSr ^^^^^^ 

content ^of wluch , are ; mcorporated herein by reference mcros «>P<- From the standard wlutions dose 
p.^ " by ^ ^Uowmg Eaam- ^J^^ ^ ^ 

EXAMPLE 1 55 

An anti-TNF (tumour necrosis factor) ^tibodv h.v ™<^««*» ' nrcn.n 

ing an affinity constant for TNF^Vv r Y . B ^»' i Kai <lnnr ^^r- °» TNF «p« 

IX^hWmoleis labdTed^Te^ iSd °it2 17 

solution are added m the form of droplets one to each - 

X^m^t^ Md ^ <°< "CO being as foltews: 

An anu-HCG (human chorionic gonadotrooin^ anti 

"^constant for HCG « fs'C 2f HC ° ~ Mh - mC " ' 

about 6X10» bters/mole is also labelled with Tea« UxMi Xrt 1'"°^^^-°° HCG w 

Red. A soluuon of the antibody at a concentration ofW 
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-continued 



Pi 1 1 Ocoreiceneg 
'iotas Red fluorescence 0n HCG 4pot 
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The artificially produced solution was found to give 
ratio readings of 5.9 on the TNF spot and 10.5 on the 
HCG spot, correlating well with the actuaJ concentra- 
tions of TNF (0.5 ng/ml) and HCG (0.5 ng/ml) ob- 
tained from the dose response curves. 

EXAMPLE 3 
Using similar procedures to those outlined in Exam- 
ple 1 a microliter plate containing spots of labelled ami- 
T4 (thyroxine) antibody (affinity constant about 
IX 10" liters/mole at 25* C ), labelled anti-TSH (thy. 
roid stimulating hormone) antibody (affinity constant 
about 5X 10* liters/mole at 25* C.) and labelled anti-T3 
(triiodothyronine) antibody (affinity constant about 
1 X 10" liters/mole at 25* C) in each of the individual 
wells is produced, the spots containing less than 
IX 10- » V moles of anti-T4 antibody or less than 
2X10-1 V moles of anti-TSH antibody or less than 
1 X 10-" V moles of anti-T3 antibody, 
r ^ d ™l° ping ^ody (site-recognition reagent) 
for the TSH assay is an anti-TSH antibody with an 
affinity constant for TSH of 2x 10" liters/mole at 25' 
C This antibody is labelled with fluorescein (FITQ 
The site-recognition reagents for the T4 and T3 assays 
are T4 and T3 coupled to poly-lysine and labelled with 
FTTC, and they recognise the unfilled sites on their 
respective first antibodies. 

Using 400 microliter aliquots of standard solutions 
containing various known amounts of T4, T3 and TSH 35 
dose response curves are obtained by methods analo-' 
gous to those in Example 2, correlating fluorescence 
ratios with T4, T3 and TSH concentrations. The plate is 
used to measure T4, T3 and TSH levels in serum from 
human patients with good correlation with the results <° 
obtained by other methods. 
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EXAMPLE 4 
Using similar procedures to those outlined in Exam- 
Pk . ] Ji DicrolilCr plaie COntaini ng spots of first labelled «5 
anti-HCG antibody (affinity constant about 6X10 8 li- 
ters/mole at 25* C), second labeUed anti-HCG ami- 
body (affinity ^constant about 1.3x10" liters/mole at 
25 C) and labelled anti-FSH (follicle stimulating hor- 
mone) antibody (affinity constant about 1 .3 X J 0 8 liters/- 
mole at 25* C) in each of the individual wells is pro- 
duced, the spots each containing less than 0 1 V/K 
moles of the respective antibody. A cross-reacting 
(alpha subunit) monoclonal antibody 8D10 with an 
affinity constant of 1 x 10" liters/mole is used as a com- 
mon developing antibody for both the HCG and the 
F5H assays. 

Using 400 microliter aliquots of standard solutions 
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in the urine of women in pregnancy testing, giving pood 
correlations with results obtained by other means and 
achieving effective concentration measurements fox 
HCG over a concentration range of two or three orders 
of magnitude by correct choice of the best HCG spot 
and dose response curve. 
Production of labelled antibodies 

The labelling of the antibodies with fluorescent labels 
can be earned out by a well known and standard tech- 
nique, see Leslie Hudson and Frank C Hay, "Practical 
Immunology", Blackwell Scientific Publications (1980) 
pages 1 1-13, for example as follows: 

The monoclonal antibody anti-FSH 3G3 an FSH 
specific (beta subunit) antibody having an affinity con- 

? B liS,° f L3X 108 molc ' *™ P rodu «d in 

the Middlesex Hospital Medical School and was la- 
belled with TRITC (rhodamine isothiocyanate) or 
Texas Red, giving a red fluorescence. 

The monoclonal antibody anti-FSH 8D10 a cross- 
reacting (alpha subunit) antibodv having an affinitv 
constant (K) of 1 X I0» Uters per mole, was likewise 
produced in the Middlesex Hospital Medical School 
and was labelled with FITC (fluorescein isothiocya- 
nate), giving a yellow-green fluorescence. 

The general procedure used involved ascites fluid 
purification (ammonium sulphate precipitation and 
T-gel chromatography) followed by labelling, accord- 
ing to the following steps: 
l.a- Ammonium sulphate purification 

1. Add 4.1 ml saturated ammonium sulphate solution 
to 5 ml anti body preparation (culture supernatant or 1 -5 
diluted ascites fluid) under constant stirring (45% satu- 
ration). 

2. Continue stirring for 30-90 min. Centrifuge at 2500 
rpm for 30 min. 

3. Discard the supernatant and dissolve the precipi- 
late in PBS (final volume 5 ml.). Repeat Steps 1 and 2, 

4. Add 3.6 ml saturated ammonium sulphate (40% 
saturation) under constant stirring. Repeat Step 2. 

5. Discard the supernatant and dissolve the pellet in 
the desired buffer. 

6. Dialyse overnight in cold against the same buffer 
(using fresh, boiled-in-d/w dialysis bag). 

7. Determine the protein concentration either at A210 
or by Lowry estimation. 

l.b. T-gel Chromatography: (Buffer: 1M Tris-Cl. pH 
7.6. Solid potassium sulphate) 

1. Clear 2 ml of ascites fluid by centrifugation at 4O00 
rpm. 

2. Add 1M Tris-Cl solution to achieve final concen- 
tration of 0.1 M. 

3. Add sufficient amount of solid potassium sulphate 
Final concentration: = 0.5M. 

4. Apply the ascite fluid to the T-gel column 

5. Wash the column with 0.1 M Tris-Cl buffer contain- 
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containing various known concentrations of HCG and - — — «»™n w, uu .i M i ns-u buffer contain- 
, dose response. curves are obtained by methods 60 ">« °-5M potassium sulphate, until protein profile fat 
analogous to those in Example 2, correlating fluores- A ~" 1 
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cence ratios with HCG and FSH concentrations, the 
curve obtained with the higher affinity anti-HCG anti- 
body giving more concentration-sensitive results at the 
lower HCG concentrations whereas the curve from the 
lower affinity anti-HCG antibody is more concentra- 
tion-sensitive at the higher HCG concentrations. The 
plate is used to measure HCG and FSH concentrations 



65 



Ajgo) returns to zero. 

6. Elute the absorbed protein using 0.1M Tris-Cl 
buffer as the eluant. 

7. Pool the fractions containing antibody activity and 
concentrate using Amicon 30 concentrates 

*• ^ Purification is to be carried out, use 

HPHT chromatography Starting buffer during Step 7 
2. Labelling of Antibodies FITC/TXITC conjugation 



13 



Jf*** 1 g protein kt 0.2JM Carbon- 
ate^carbonate buffer, pH 9.0 to . concentration f » 

Z Add ^C^WTC to achieve a UO ratio with 
pro«e» (Le. 0X» mg for every 1 mg of pro.eL) 

3. Mu and incubate at 4' C for 16-18 hrs 
galed by™*" ^ COnjU8ated " roteio ««» «»conj U . 

h ^rSS** 0 ^ 5 ^f 0 ?" 0 *"?^ for FITC label, or 
chromatography for Tr£o 

Buffer system: 
PBS for (a). 



5,432,099 



14 



10 



is 



2-r> y on«u. 

OJJJM am - UJJ x OJD.49J am" 



20 



I claim: 

1. A method for determining the ambient concentra- 
tions of a plurality of ^yxes ^ , ^ 
volume V liters, comprising: 
leading . plurality of different binding agents, each „ 
bemg capable of reversibly binding L iuSy* 
which ts or may be present in the liquid sample £d 
*spe«fic for sa,d analyte as compared to the other 
components of the liquid sample, onto a suppon 
means a, a plurality of spaced apart small 2£ „ 
such tot «*h spot has a high citing denshTof 
one of said bmdmg agents but not more tbTo 1 
\YK inoles of binding agent are present on any 
spot, where K bters/mole is ,h e ^ constan^f 
said binding agent for said analyte: „ 
contacting the loaded suppon means with the liquid 
sample to be analyzed, such that each of £ spou 
"contacted m the same step with said liquid^ 
pie, the amount of liquid used in said sample beT* 
such mat only an insignificant proportion of » J « 
analyte present in said liquid sample becomi 
b~d» said binding agen, spec^ ^ 

measuring a parameter representative of the frac- 
tional occupancy by said analyte of said binding 
agents at the spots by a competitive or «o«2 
petmve asuy techni que using . 

mg either the unfilled binding sites or the fUfed 
bmdmg sites onsaid binding agent, said site'recS 
nmon reagent bang labelled with a marker «- 
ahbng the amount of said reagent in the panicu£ 
location to be measured. 

2. A method as claimed m claim l. wherein each of 
said spott has a siie of less than ] mm: 0f 

3. A method as claimed in claim 2, wherein each of 
sarispou contains more than 10* molecules of Sing 

4. A method as claimed m claim 3, wherein each of « 
«d spou ha, 1« to 0.01 V/K ^ of 2*J 60 

5. A method as claimed in claim 3. wherein said bind- 

"fr^r^Xr ConiUn " for said^nalSes 

f from 1C to I0>3 liters per mole. Mbytes 

65 
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6- A method as claimed in claim 3 wh*r«* ..^ u: 
«ng agents used have ^^^^uh^^t 
of the rder of 10'0 r K^*^"" 1 

9. A method as clamed in claim 1, wberenVl^dhL 
termined. concentrations art to be de- 

conation levels of ^ B^-**.* 

11. A »;Ho<J u cluncd is chin 10 wkmi, „n 

binding agents and said site-re J™;*^ 

14. A device as claimed in claim 13 wherein «rK „r 
sarfspots contains more than i* moLutTf b^g 

" waling" r" 1 " b ? Ving l0Ca,ed " high 

coaung density at a p l urality of ^ ^ 

spots a plurality of different binding agents, ^ch 
^V^ 'T 8 Cap » b,e of reve^SrTuS 

„^ B SpCClflC for »« »»>yte as compared 
to the other components of the liquid sample, «ch 
spot having not more than 0 1 V/K mAuT "r 
single binding agent, where K bWn^eVL 
affinity constant of said single binding agent for 
reaction with the anajy„ to thich ^ sSc 
cS?? ° f SUn ^ d ^P'" conin^TSwn 
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A method for determining the ambient concentrations of a 
plurahry of analytcs in a liquid sample of volume V liters 
comprises ' 

loading a plurality of different binding agents, each being 
capable of reversibly binding an analyie which is or 
may be present in the liquid sample and is specific for 
that analyte as compared to the other components of the 
liquid sample, onto a support means at a plurality of 
spaced apart locations such that each location has not 
more than 0.1 V/K, preferably less than 0.01 V/K, 
moles of a single binding agent, where K liters/mole is 
the equilibrium constant of the binding agent for the 
analyte; 

contacting the loaded support means with the liquid 
sample to be analyzed, such that each of the spaced 
apart locations is contacted in the same operation with 
the liquid sample, the amount of liquid used in the 
sample being such that only an insignificant proportion 
of any analyte present in the liquid sample becomes 
bound to the binding agent specific for it, and 
measuring a parameter representative of the fractional 
occupancy by the analyies of the binding agents at the 
spaced apart locations by a competitive or non- 
competitive assay technique using a site-recognition 
reagent for each binding agent capable of recognizing 
either the unfilled binding sites or the filled binding 
sites on the binding agent, said site-recognition reagent 
being labelled with a marker enabling the amount of 
said reagent in the particular location to be measured 
A device and kit for use in the method are also 
provided. 



17 Claims, 1 Drawing Sheet 
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DETERMINATION OF AMBIENT 2 

iaocground of the invention ISSE?" ' e *°° wi * ""^SS, 

in sucb a weiJ-kDown work as -Mcihnrf«Ti • exam P Ie > is to be m e «ur,H ,h, 8 aDI,bod - v ) w °°se concentration 
Diagnostic Endoemolanr ^ Jr = '"^^Uve and * ™ £ i ™* ured ,bal » P res «< in the liquid sample under 

^-fc^i^'SSJg'JffSjJ: « ^ SUMMARY of the invention 

ujnamg agent is oeithcr ncccssiry for 
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good sensitivity io inanunoissav* ;. „ . 4 

oece«ry to «e an accurately c^^lTJoZl ««»««*»» -fStoS £tS?SL P ^ ena 
for all toe liquid samples (sUndlrd .hJ^LtoS enM * more «"« 3 . JR. Site £jLwT*? ° MSid - 

concentration using much kmZ^^SS^ 10 of ^ tolWorMVM1,h ^^ 

ations bSTS^ nL ? , aC ? pUWe - PraclicaI «^idcr. tb*i only an insigjficamnroSo? ^ Mmple *.cb 

1. will be appreciated ua, the .j^,, „ jo «• »d " Dfl «° "* buld «* 

.pplic.ion 2.W9478A. whiS or ^nuS ° B PMm * "P«scnu,ive of toe fW , 

relieson large amouDlsof biodin» . , * """"Hon occupancy by the analytes of toe fracUoMl 

requires only a veVy tew ™^ 8 ?' Ud Wnicb lhus said ""'"cognition reaeent be?™ i?£n / b " d " )8 a S e «. 

amounts of binding aeem is tv™,;...vi 01 SUCD small 40 The invention also provides > t 

to place toe bindS .fern S ' " beOWneS feasible *« an »>i«» concensus o? ?S T ™ de,ermin - 

meLrememTnTv^^ a -mple of vo u me "1'^ " ^ °' " 

hence lo place in juxupL^on to 7~ . ,u u SUpP ° n aDd SU PP° rI »"» "« ving located to*™ • ^ 

separate points on a S sol^s^^ ^ ' Spa " d apan ,oca "°«" vjf^ ^ 

different binding agentsTcifc fo r ^ V8rie,y of « " Ch bil,din 6 being capable of Z t 

are or may bTpresenSnf ? ertDI whicb aMl - vlE "1*0 * or may L d™ TV"** buidin 8 » 

poinu to toe liquid to beSSUL ^ ""L ^P 4 "' 1 DeD1S ° f ,he U ^ id ""pie. eStoSS h """P 0 - 
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curves provided (Jul the intibodv concentriiinnc *nH m. , u 

lions or multiples of 1/K. °' frac ** 10 «* mc » sur «l from tbc binding .gcnTor the 

BRIEF DESCRIPTION OF THE DRAWING Z^STV"^ b * eoen1 ' refleclive Bateri "* « 

^ 5 V^ tmta 1,1 "us cisc to enhance lieht cnlleninn ;„ ,k 

The pnnciplc underlying the method of the invention mav de,ec *« inst ™ n '«' « photographic Jae/n* S cnokc 
i^tf d * reference to .be accompanying ° f °P UmWD a ^ « governed by i te .bflitv to 
SSI^.L^ ,w ° sets of curves b»d»g V»| «o its surface, its absence of background signS 

ptoUmgtherelaUoi^pbe.^naDtibodycoDcentration.Dd emlss,0D *** ,te Possession of other properties tending^, 
^JS™ .° 0CUpinCy ° f ^ Wndin 8 sil « « «nain 10 ma5 ? m,se »P>al/noise ratio for tbe particular markeT ot 
^ . w 8 " Conceotrauons «J «* relationship markers a f cbed 10 b "rf»g agent situated on its surface 
^ "libc^y concentratwn and the percentage of anti- ?ry satisfactory results have been obtained in the Examoks' 
gen bound to me bmdmgsttes at me same prescribed antigen described Mow * ^ use of a white cmaque poS£ 
concenuauons. Each curve relates to the antibody concL- m ?°T pla " c °«««»«ci,Uy available Ln 52Sh 
EE £ AhT^? m ' ennS ° f 1/K " P' 0,led the » Und « r « ade nan >< Whi«e Microfluor microU.re ^ 
"7 For the set of curves which remain constant or , ^ bmdul g ™* may be binding aeeuts of dif 

decline with mcreasmg [AbJ, tbe y-axis represents the fercnl ^ that is to say agents which arVsLSL^ 
iractional occupancy (F) of binding sites oo the antibody b v d »ff«rem analytes, or two or more of them mav u.-j- 
the uUfmiu die second set, the y-axis represents mi a 8 enls of «° e s «»< specificity but of oiffe«n° ffiSv ,ht U 
percentage (be) of antigen bound to those binding sites. Tie » '° My *™ w "ich are specific to £ s™7« n ,Su have 
.ndmdua] I curves tn each set represent me relationships differ «" 'q^ium constants K for"Tata« , wJb ? Tfc 
corresponding to four dtfferen. antigen concentrations [Anl la,ter » h "»«iv« « particularly useful SS m TconienS 

o?iTir^ 0 I K ' r e,y r 10 ^ 10/K - °- i/k - ti0 ° of ana,y,e 10 66 —»« , '» «* sarnie • 

0.01/K. Tne curves show that as [AbJ falls F reaches an vary over considerable ranges, for example 2 or 3 eSenTrf 

esentully consttmt level, the value ofwhicb is dependent on * n»go..ude. as in the case of HCG meatmen in urmT of 

lADl P«P'»'*'omen.wberei,c,nv^f r om0.1 tolOOormore 

DETAU^DDESCRimON Tne "w, ; agents used will preferably be antibodies 

u Z ,Z $Uf ! POn * * Da,,er 10 * lefl » tbe m bX/m i monoc,00 « 1 "'ibodies. Monoclonal anti-' 

user. Preferably the support is non-porous so that tbe binding 30 * '° 8 ^ Vane 'y of '"g^dients of biological fluids 
agent is deposed on its surface, fo, example as a monolayer « «»nieic»lly available or may be made by known 
Use of a porous support may cause tbe binding agent ™ e aD , Ubodies used d«pl»y convention^ 

depending on ,ts molecular sizt, to be carried down into uW ° 0ns,aD ?' ,. for " affl P le 10* or 10 9 liters/mole 

poresoftbesuppon where i tt exposure to the analyte whose ,< S^.*?- ■?* ° f ' b !. 0rder of 10l ° « li>ers/mole, bm 
concentration is to be determined may likewise be affected 35 t g ffin " y anllbod,e s w iU> affinity constants of 10"-10» 
by tbe geomeuy of tbe pores, so that a false reading may be SS^. 0 " iU ° ** used - ^ »' v «"on can be used with 
obtamed. Porous suppons such as nitrocellulose paper dot- u b,Dd "* a ? eDls whicb » fe »o< themselves labelled 
ted w«b spots of btndmg agent « merefore less preferred. ."IThk"^ i>SO aDd desirable to u« 

Unlflte .be supports used in GB 2,099^78A, which seem to ^ bu,dm8 aeenls 50 u,a ' ,be *ysieni binding agenU 
need to be porous because of tbe large number of molecules 40 " a ^"'-"cognition reagent includes two different labek 
to be attached, tbe supports for use in the present invention same type, e.g. fluorescent, chemiluminesceni. 

use much smaller quantities and therefore need not b? '"^ ° r tldi °>*»°PK. one oo tbe binding agent and one 
porous. Tbe non-porous suppons may, for example be of Z It s " e - reco 6 Di,i ° D '"8«<- Tne measuring operauon 
plastics material or glass, and any convenient rigid plastics m , e&SUre$ tbe raUo of ,be imensi, y of tbe two signals and 

materia^ ~y be »«L Polystyrene is a preferred Jl, s ,i« 45 a b b " n !L Un h ,Da,es ,bt oeed 10 P' a « *e same amount of 
maienal. althougb other polyokfins or acrylic or vinyl JSw 8 JF? °° SUp P° n when »«»suring 
polymers could likewise be used. X s, 8 nals {ro <n standard samples for calibration purposes^ 

The support means may comprise microbeads, eg of S^" ™? , ™ rin8 * ig ™ ls llom ,hc "°known samples, 
such a plastics material, which can be coated with uniform , SySlem dcpcnds *° M y on ^asuremenl of a 

layers of binding agent . nd retained in specified locS™ 5 ° T'JST^ 11 "' ° f ° CCUpaDCy ' lb " e » 

e.g. boUows, on a support plate. Alternatively the malerS ZnZZ ,0 , meaSUre ,be S1 8 nal «be entire spot but 
may be » tbe form of a sheet orpUte which is spotted w],b " w fLuT™ ' S SUEcier ' L &eh bindin 8 »8«t is 

an amy of do|s of binding age„L It can be advama^eousfor ZTl d ^ ^ ' abe ' diffe " m ,abels 

me configurauon of the support means to be such that bquid s < £ „ , 

samples of approximately tbe volume V liters are readily „f \T B . "S"" 5 may bc a PP bed 10 lbe support in any 
retained in contact with tbe plurality of spaced .pan loca- h , WayS kn ° Wn or »nvonioMUy used for coating 
lions marked with the different binding agents. For example V"*™' 1 ^ CD \ ODto su PP°«s such as tubes, for example by 
the spaced .pan locations may be arranged in a well in the S^*™ 8 , t . • S ?- Ced apa " loca,ion 00 lhe su PP° rl w »h a 
support means, and a plurality of wells, each provided with „ rf< °, d, " B ag - cn ' in lhe form of a small drop, e . 

lhe same group of different binding agents in spaced apan J 1 r '° D a 1 a f ?P 01 - and a »° w i°g tbem to remain 
locations, can be bnked together to form a microtia p i ale ,w,v i V ° f timC before wasbin g ,b = drops 

for use with a plurality of samples. F away ^ roughly constant small fraction of the binding agent 

When tbe support means is to be used in conjunction with ^suTlnhic ^ ^"T 00,0 ** SU PP°« " a 

a me^ring system involving light scanning, tbe Serial « Sy of h " " '° ^ D °' Cd ,ha ' '»« cosiiag 

e.g. plasncs, for the suppon is desirably onaoue to Sh™ f~ ■ "l X.' D ' D : mE a6eD1 on ,he m i«ospo. does oot oced lo 
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accuracy bemg obtained, other things being ~ ua fc n « n f b °' ,wo fluores ««" mMw; bowe^" fluT«* 

indirect* UWfed 1* .teSSE? ^JLS™" 1 ^ k°' eXample ,wo """oiso.opes such L * », Tod' F ° r 

labelled directly or indirectly S£Z£glZS£ 11^ °° *« bub of.be differing eoeX * 

labels sucb as fluorescein, rtod^nTor T„« ^ res P ec " ve radioactive emissions Likev^L ?, ^ * ! 

materials usable in tioe-reih^ IX. «L Red 0r 60 s,ble 10 ideolif y products of .wo 7,,^.* ^ 
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such as blood, scrum, saliva or urine Tbev mav he n V H fx, , ^ 

.he assaying f a wide vane «y of to™ £ pm,^' wTaKowSl^ r^"* fc te °< 

enzymes r other anajytes which are either present nalS , 10 ma,bl,e for hours. About 400 

in the liquid sample oYmay be ~T?iSS v K 16,5 ° f SUodlrd contiinioTSown 

drugs, poisons or the like. ' ? " eb tt € ""^J"" (0.02, 02. 2 .od 20 ng/ml) of TW orlSS 

For example, the invention m.y be used to provide a uTcuo. e for ^ e pll \ ,0d ^j" 0 "* to 

dev.ee for quantitatively assaying , variely of JJ** J£ wells « then washed 

relating to pregnincy ud reproduction, such as FSH I H a. ... • • SOIUU0D - 

HCG. prolactin and steroid hormones (e-gpLVs^of' S no* .„ ThS"" ^ geD,S " ^ for ™F 
esradiol. testosterone and .ndrostene^ionf ), to W a 2^ C^^^^r 18 " 4ffini,V OTnsUD1 fof 

of the adrenal pituitary axis, such as Cortisol. ACTH and JUL - C - £.« bom ^^'^"rsAnole and for tbe HCG 

aldosterone, or tbyroid-related hormonal, ts^7 £ HCG « £ V*? U lffiai * «»»« 

and TSH and their binding protein TBG, or vinisls such S ,« fab^d with aS"' ^S^S"*"*- Both antibodies 

athletes' performance, or food contaminants. In each case ™i n n"? , fluorescen « ««>o of each spot is quantified 

thebindingagen^usedwillbespecificfortbc aoalytestofc Tl^™ MRC 50 ° coofocll miaoscopT 

assayed (as compared with others in tbe sample) and may be 20 i^Hr? !^ dose '"P 005 * ««* TNF 

monoclonal antibodies therefor. 7 and HCG are built up, tbe figures for TNF being as follows: 

Further details on the methodology are to be found in mv 

International Patent Publication WO88/01058, the contents ™— . Fnc fluoreictoct 

of wbicb are incorporated hereio by reference. >J™ TSSTCTOS^S- °» 7NF «P«" 

Tbe invention is illustrated by tbe following Examples 25 aia ~ 

EXAMPLE 1 2 *i 

20 42J 



An anU-TNF (tumour necrosis factor) antibody having an 

affinity constant for TNF at 25* C. of about lxlf/liiertJLi. , n J . 

is labeUed with Texas Red. A solution of the "* f ° f HCG bein 8 " fellow. 
concentration of 80 micrograms^! is formed and 0.5 micro 



wmie) filled polystyrene microutre plate having 12 wells 35 7T~" 

An ami-HCG (human chorionic gonadotropin) antibodv M I* 

having an affiuty constant for HCG at 25° C. of about 6xlO> 2 «o 

liters/mole is also labelled with Texas Red. Ablution of he 20 » 

ami-body a. , concentration of 80 micrograms/ml is formed ~ 

and 0.5 microliter abquots of this solution are added in the « The artificially produced solution was f„,,„H ,„ „• 

nours m a humid atmosphere to prevent evaporation of the response curves " * 

droplets. During this time some of tbe antibody molecules in 45 

^^^b^^Mo^^^Z^S^^ EXAMPLE 3 

are washed several times with a phosphate buffer' and then Vsing similar P'°«<J»res to those outlined in Example 1 

fcey are filled with about 400 microliters of a 1% albumen f °"™"re ph.c containing spots of labelled ."fi-VJ 

solution and kft for several hours to saturate the residual < ,b > TO)t ">0 ""body (afBnity constant about 1x10" lite^/ 

with pbospbate buffer. hormone) annbody (afiBnily constant about 5x10 s liters/ 

Theresuiangpl.tehwineacbofilswells.wospotseacb PI C) ^ ' abeUed an,i " T3 Criiodotbyronine) . 

of area approximately 1 mm 2 . Measurement of the amount r y constam aboul biers/mole «t 25* 

J, ^TZ™ fT ,hl1 " Mch weB one s l»' »"'ains fi," 5 Mch , 0f ' be ,Ddivid 1 u 2 al we "s » Produced, tbe spots 
about 5x10* molecules of anu'-TNF anubody and tbe other " f OD,ain,n f ie « ! baD J*"*"" V moles of anti-T4 antibodyor 

«u.. about 5xl0» molecules of anu-HCo" anubedy £ S?^ 10 ;" V "°'« °f anti-TSH antibody or less tnw 
wells are designed for use with liquid samples of volume " of anli - T3 antibody. 

400 microliters so that 0.1 V/K is 4xl0* J ' moles 7 _T hedevelo P in 8 a "'ibody (site-recognition reagent) for the 
(equnralen. to 2 .4x10" molecules) for the anU-TNF anu! VV" a ?' , '- TSH aD »"body with an affinity const m 

body and 7xl0-» moles (equivalent to 4x10'° molecules) 60 I s " of . 2x10 ° li'ers/mole a. 25' C. Tnis antib^y J 

for tbe anti-HCG antibody. labelled with fluorescein (FITC). The si.e-recoSon 

rMBeols for lbe T4 »««• T3 assays are T4 and T3 coupled to 

^PLE 2 P<»'>;;>y;- .nd labeUed with FITC. and tbey^coS .lie 

A microtia plate prepared as described in E« am „i, , • . " T , . ° D ,beir "W**"* *™ antibodies, 

used in an assay for .. Wjf52j SS co " J™ g 400 T° li,er ot "'"'ions co„. 

taming TNF and HCG. A test sample of tbe soluSn 8 Van ° US k ° 0WD amouols of T4 ' T3 aod TSH, dose 

Pie 01 the solution. response curves are obtained by methods analogous to tbost 
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fJ^ P «"«l»UDg fluorescence ratios with T4. 73 
. 2 55 n-mtrta* *used «o measure T4 73 
«Dd TSH kvek u jerun, from human patients with aood 
correlation with tbe results obtain by other meSs* 

EXAMPLE 4 

Using simiUr procedures to those outlined in Examnle 1 
a microtitre plate containing spotsof first labelled ffiSv 
antibody (affinity constant about M^SSST^SS 
C), second labelled anti-HCG anuW /,£^L 25 
about UxHr" liters/mote .^"tSS'? °° n l*S 
(follicle stimulating homo!) antjbS? X2S "'"^ 

oping antibody for bo^HC^d I'ShS^ 
Using 400 microliter aliquots of standard solutions con 
uuung v.nous known concentrations of HCG £d FSH 
dose response curves are obtained by methods atJogo^"' 

HCG and FSH concentrations, tbe curve obtained with the 
higher affinity anti-HCG antibody giv"c moTe 
concentrauoD^nsiuve results at tbe lower HCG concenwa 
lionswberws the curve from the lower afioftySSSs 
antibody is more co Ke n.ration-sensiuve a. tb e iU e C~ 
concentr.uons. TT* plate is used to measure HCG and kS 
concentrations in the urine of wnmm ;„ * H 
giving gcjod -^SfttS^fA 
rne«s and achieving effective concentrate measuL« J 
52 C0ncemra,i0D of two or UwTSS 

Produciion of Labelled Antibodies 

The labelling of the antibodies with fluorescent labels «„ 
be c^ed out by a well known and sundar^Se sL 
Lesl^udson and Frank C. Hay, "Praaical hnJS&Z 

Tbe monoclonal antibody auti-FSH 3G3 an rcu 

it d ^. ISO,hlocy,ni,e) or *- 

The monoclonal antibody anti-FSH 8Dm . ~ 
reacting (alpha «rt«Bil) inflandv 1^1^,^ 0S ** 
(K) of 1*10" liters per w*iL^^ ^ X T^"' 
M^lese, Ho^.^^SJS^ 
mZSST' ^^"^ tH*« • yCow^en 

l.a. Ammonium sulphate purification 

1 *n^/ B ;i£L M,Ur,ted uan,0Diu «' "Wwe solution to 5 
ml antibody preparation (culture supernatant or 15 

ISSttT" } uoder ^*"*"«» 

l a2tVSS B '°' ^° min - Ce0 ' ri ^ « 2500 

3 ' Sf« f TT Md disso,ve ,be P'edpitate in 
PBS (final volume 5 ml.). Repeat Steps 1 ,J 2 , OR 



rr 
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4. Add 3.6 ml saturated amm nium sulphate <40« 
saturation) under ctmstuu sUrring, R epe.?S!e? 1 

Sb^ie^ ^IIttD, ind 

7 L^7es&^ 

l.b. T-gel Chromatography: (Buffer: 1M Tris-CL »H 7 * 
Solid potassium sulphate) ^ pH 7 6 - 

1. Clear 2 ml of ascites fluid by centrifng.uon a. 4000 

1 Him™ S ° 1U,i0B 10 1Cbieve fin »> concentration 

3. Add sufficient amount of solid potassium sulnh.te 
Final conceotration-0-SM. »~ , « s,u ™ sutpnate. 

4. Apply the «scj le fluid to the T-gel column 

5 " oS^ST ° 1M ™ buffer 

SttS" sulph,,c ' uwfl protein profil < (- 

6 ' t^elut L abSOrbed Pr0,ein USi " 8 0J M T **" « 

7. Pool the fractions containing antibody activitv .»< 
concentrate using Amicon 30«„cemSer ■ * * 

8. If HPHT purification is to be carried out. use HPHT 
chromatography St^ng buffer during Step? 

2. Labelhng of Anybodies FTTC/TRITXT conjugation 

S I L UI t fied 1 « P ro,ein »» 0-25M Carbonate- 
bicarbonate buffer. pH 9.0 to a concentration of » 

2 ' FcMS C ™? C 10 aChiCVe » 1:20 '"to *i» Pn».ein 
(i.e. 0.05 mg for every 1 mg of protein). 

3. Mix and incubate at 4' C. for 16-18 his 

4. Separate the conjugated protein from unconjugated by 
a. Sephadex G-25 chromatography for FTTC label. 

b UblL E " SePbaCel cbron,a, °B r »P''y for TR1TOFTTC 
Buffer system: 
PBS for (a). 

0.005M Phosphate, pH 8.0 and 0.18M 
Phosphate, pH 8.0 for (b). 

ClajlMioD of FtTC. Preuu, coupling ™,i 0: . 
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EXAMPLE 4 

Regents 

1 SSf N3,, ' oni, ,ns,i,u,t for Bio1 ^' 

5 Phosphate buffer, 0.1M, pH 7 4 

6 Tris-HCl buffer. 0.05M, pH 7.6, containing 0.5% bovine 
ttdTmtr (BSA)> ^Tweenlo^ai* 

7 0 4 itT,! er: P ,n Spb J *' e buffer - 0JM - P« 7 -4. containing 
O.mTween 20 and 0.1% sodium azide ^ 
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8 Blick microtitre strips from Dynatecb 

9 Superfilock from Pierce 

A. Protocol and Conditions for tbe Radioimmunoassay of 
Thyroid Stimulating Hormone (TSH) ' 01 

1. An^iquot of 50 * of SO^mJ anti-TSH monoclonal 
•ntibody in phosphite buffer was ,dded to micS 
weUs and incubated for 1 hour a, room tem^ 

2. Toe microti Ire wells were wach^ri uwn. u 
buffer, blocked with Su^SforTrnt^ 
room temper. rure and then washed again 

3. £ aliquot of 100 fi of TSH standards made up in 
l S Z t ™*! aa <V ield concentrations of 0 

&o£!d 8xl(re> 12x10 "' 16xl " ' 'and' 

I-labelled TSH in Tns-HC) assav buffer were adderi 
•o triplicate anu-TSH monoclonal antibody <££ 
m.crotitre wells, shaken for J hour « room 
temperature washed with wash buffer and counted"" 
radioactivity. The concentration of TSH "n the 
unknown samples can be read from the standard curve 
The incubation period of 1 hour for the assay is faT^L' 
than tbe time required for the binding reaction in ? 
couib-brium. but. provided the standaxTa r™™ d 

f e antibodv will of c^&SZ&S^™* 
hour mcubauon and under the same conditio P ns asl« ^ v 

say Performed Under the Conditions Described in (A) 

1 . An abquot of 50 Al of 50 ugjta\ anti-TSH , , 
antibody in phosphate bufieT was addVd U fl 
welU and incubated for , hour at rcTte^"^ 

2. Tbe microtitre wells were . *.u \ 
buffer blocked with ^St^f 
room temperature and then washed again 

3. £ aliquot of 100 * of TSH standards made up „ 
TSH-free scrum (to vield final r«««- . ? 

and 20x10- M/L) and W^l of 4*Eltod ™ 
Tris-HCI assay buffer were add«J .„ , i H 1D 

. . . rwereaadcd 'otnplicaleanubodv 

coated microiiire weUs. shaken for 1 hour at roTm 
temperature, washed with w«h h„ff. I 
radioactivity. ^ b " ffer md °° UDled for 

4. A standard Scatchard plot of Bound/Free vs RonnH 
TSH was used to obtain .be affinity consTam K 
monoclonal anti-TSH antibody lhe 

<0 C 1 ^ *° AaJOum of ^P'"" Antibodv 
=0.1 V/K and Deposited on the Solid-Phase as MimZ 
Since tbe assay volume V is 0 2 ml or ^ffl* 
affinity constant K of the anu-TSH ca"mre antibody u^d 
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aspirated instantly. This procedure resulted in .mite*, 
***** with . coated area J^SSg^ 



Molar .moua of eo«.d ufcody oa ou«»« " 
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-0-1 V/K 



Or a capture antibody concentration of 9xl0"> M/L 
Assay Protocol: 

1. A 0.5 /J droplet of a monoclonal ami Tsw . 



60 



65 



or a capture antibody concentration of 0.85x10-" 
2. The micro litre wells were wa «J,^ «„-,u . T* 

3. 100/d of T5H sundards (made ud in TSH fr~ . x 
or unknown samples phis 100 ? of t£ M ™ m) 
buffer were added to plicate o&Sta^,ffi 
foM hour a. room temperature and washed 5b wash 

and complementary to the czr^r? H * o]t *tit 
r icrospo, on ^SUSSSSS 

ssr wasbed as 

amount of fl UO reIc?ncron .h? m ° 1CraSp0,S " Dd ,he 

spot is going to be less than 0J m Ub0dy **** °° 
What is claimed is- 

liquid sample of volume V bUtfe . ^ " * 
loading a piuralily of different bindine aeents «rh h.- 

onto a support means al a plurality of «i^ P * 

ST" SUCh ' ba ' D0 ' ^ vKlesTf 
nodiog agent are present on any spot where K M^/ 

CODS,40t of ^dt^'S 

contacting tbe loaded support means with ih s «„ -i 
sample to be ana ly zed, Eh -ha, each o f.he spo s Is 
contacted ,n the same step with said liouid LVmrt^i. 
amount of liquid used in said samp ? 
only an msignifican, proportion of ^ny 
in said liquid sample becomes bound to «iH 
agent specific for said analyw b,Dd, ° 8 

sites or tbe filled binding sites on said binding agent 
said sne-recognition reagent being labelled with , 
marker different from the marker onlid 5Xg 
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measuring a r..«o of signals boa said markers oo the siic 
recognition reagent and the binding reagent fon, „ 
least a pan of the spot, from which the analyte , 0 
interest is determined. 

2. A method according to claim 1, wherein the markers oo 

trssK rei8en ' ud ,he « 

JLi* 10 ^ 2 - wberein ««* ratio of 

signals is measured usmg . laser scanning coafocal micro- 
ti. A method for determining the fractional binding site 
occupancy of a p lurah.y of binding agents by a plurality 0 f 
analytes in a liquid sample of V h,ers, comprising- 
(a) loading a plurality of different binding agents, each 
be,ng capable of reversibly binding an analvte wnicb * 
or may be present in the liquid sample and is specific 
forsaid analvie as compared to the other component of 
the hqu.d sample, onto a support at a plurali.y of spaced 
apart small spots such that each spot has a high c<E 

0.1 V/K moles of bmding agent are present on any one 

VOU ; where K bters/mole is me affinity constant of s^d 

binding agent for said analyie; 
(b) contacting the loaded support wi,h ibe liquid sample 

to be analyzed, such tba, each of the spots I comaaed 
in .be same s.ep wi.b said liquid sample. the amoum of 
liquid used m said sample being 4b that only an 
insignificant proportion of any analvie present in Jrf 
hquid sample becomes bound ,o said bZ^ 
specific for said analyte; and 8 8 

(c) thereafter contacting .be loaded support wiih site- 
recogniuon reagents which recogni* either mTunfiUed 

agent, the s.te-recognj.,00 reagents being labelled whb 
markers from whu:b the fractional binding she oc^ 
pancy for each binding agent is determined 

5. The method of claim 4 wherein th, «,-,. 
rcagenjs are .abe„ed with n^r m tleT" reC ° 8Dm0n 

6. The method of claim 4. wberein the presence of ih, 

7. The method of claim 4. wberein the presence of me 
the determined value of the f racl ional binding Le S 
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loading a p Jurali ,y of different binding agents, each bd«» 
capable of reversibly binding an Lalyte 
my be present in the liquid sample and is specific £ 

SliSH?" iS , C ° mplrcd to other comS of 
the liquid sample, onto , support means at a ptaraS of 

spacedapansmallspo^sucb tha, each spo,bas . 

coating density of one of said binding .Vents but3 

more tb, n o.l V/K moles of binding Jeft m !££ 

stant of said bmdmg .gent for said analyte- 

C °sam D Te 8 IO , t l0,d I ed ^ *e W 

sample .o be analyzed, such that each of the spouis 
contacted in the same step wi.b said liquio s^K 
amount of bquid used in said sample bemgTucb £ 
only an insignificant proponion of .nv analyte^sent 

1", " qW fi d r PlC beC0,DeS *»* "» -5 Eg 
agent specific for said analyte; 

contacting .he sup po„ wju, , ^ 
specific for each binding agent in a 'competing o 
oon^ompeuuv, ^technique, the site-recognition relgem 
sues or ue fiUed bl0ding ^ on 

mlr; M T 8WU0D rea8em ^ » 

measuring the signal from .be marker of .be si.e- 
recognmon reagent in a particular location lo detect the 

10 P rT^ f "1 Plm4li, - V 0f « "id sample 

10 A me.bod as claimed in claim 9, wherein each ofsaid 
spote has a size of less than 1 mm 3 

U.Ametbod as claimed in claim 10 wherein «r+„r. -a 

12. A method as claimed in claim 11. wherein each of s^id 

13. Ametbod as claimed in claim 11 wher^JjH k ..• 

16. A method as claimed in claim 11 wh^;«*.k i 
of said Hquid sample is 400 ,o ^ 

n. A method as claimed in claim 9, wberein s*iH hin* 
agen K loaded onto said suppon means' are an.Ss for hi 
analytes whose concen.ra.ions are to be determined 
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TbepreseoiinvenUoo provides meth~4.f .. 
concentration of analvt« £ t- Be . , J bods fw detenainiog the 

amount of bJLg^TCn^^^ " Whicb ^ 
given analyte i„ ti b aui d «™i Ug for » 

an array of spati.Uyl»,„ le ^ ^ geW ***** i«o 
concentrationof tel^JSKI The 
occuped binding agent ii£ mSSS^'"'^* ,he 
marker and integrating the signal T* 4 

array. The present intention T also nm^a ^ ,U ° D " 
determining a value reprei Dtt ^ ST?" " *>r 
sues of the binding agentwhk* .~ ^* CUOn of bindin K 
comprising imJSg^ " by the analy,.* 

wlidsur^n,wberein^sDVri^^- md,ng 00 • 
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biding agen, is dhS info "n a™ W f here,n 

locations; contacting i"^^2^Wed 

coD.actmgtbesupponv^tbU.edevefal^ S " Bpk; 
0OD-specific,Uy bo UDd developt 3^?,^^ 
«gnal a. each of the location? ,c £ * ndnlMsun °g U>e 
represents the fraction of^e ZIL "™ ' Va,Ue whi «* 
•oalyte a> each location; and aS E T e " by * he 

Provide a total signal which 8 me,SUrcd v * )ues «o 
binding sites 0 f .bf bLing a Be « o<^ ^ ° f "* 

Test kits and devices used in „ P,ed * "* ""'y*- 

also disclosed. U praCUcul 8 'bese methods are 

21 Claims, 3 Drawing Sheets 
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Tnis application is the VS. national stage of PCT/GB94/ 
02814. filed Dec. 23. 1994. ' 

HELD OF THE INVENTION 
Hie present invention relates to binding assays e g for 
determining the concentration of analytes in liquid samples. 

BACKGROUND TO THE INVENTION 
It is known to measure toe concentration of an analvte 
sucb as a drug or hormone, in a liquid sample by comacdng 
the bquid with a binding agent having binding sites specific 
for the analyte. separating the binding agent having analyie 
bound to it and measuring a value representative of the 
proportion of the binding sites on the binding agent that are 
occupied by analyie (referred io as the fractional 
occupancy). Typically, the concentration of the analyie in the 
liquid sample can then be determined bv comparing the 
fractional occupancy against values obtained from a series 20 
of standard solutions containing known concentrations of 
analyte. 

In the past, the measurement of fractional occupancy has 
usually been carried oui by back-titrauon with a labelled 
developing reagent using either so-called competitive or 
non-competitive methods. 

In the competitive method, the binding agent having 
analyte bound to il is back-titrated, either simultaneously or 
sequentially, with a labelled developing agent, which is 
typically a labelled version of the analyte. The developing 
agent can be said to compete for the binding sites on the 
binding agent with the anaJyie whose concentration is being 
measured. The fraction of the binding sites which become 
occupied with the labelled analyte can then be related to the 
concentration of the analyte in the liquid sample as 
described above. 

In the noncompetitive method, the binding agent bavi DE 
analyte bound to il is back-titrated wiih a labelled develop- 
ing agent capable of binding to cither the bound analyte or 
the occupied binding sites on the binding agent The frac- 
tional occupancy of the binding sites can then be measured 
by detecting the presence of the labelled developing agent 
and, just as with competitive assays, related to the concen- 
tration of the analyte in the liquid sample as described 
above. 

In both competitive and noncompetitive methods, the 
developing agent is labeUed with a marker. A variety of 
markers have been used in the past, for example radioactive 
isotopes, enzymes, cbemiluminescent markers and fluores- 
cent markers. 

In the field of immunoassay, competitive immunoassays 
have in general been carried out io accordance with design 
principles enunciated by Berson and Yalow, for instance in 
-Methods in Investigative and Diagnostic Endocrinology" 
(1973). pages 111 to 116. Berson and Yalow proposed mat 
in the performance of competitive immunoassays, maximum 
sensiti vity is achieved if an amount of binding agent is used 
to bind approximately 30 to 50% of a low concentration of 
the analyte to be delected. Jo non-competitive 
immunoassays, maximum sensitivity is generally thought to 

** USiDfi sufficicm b *ding agent to bind close 

to 100% of the analyte in the bquid sample. However in 
both cases immunoassays designed in accordance with these 
widely accepted precepts require the volume of the sample ts 
to be known and the amount of binding agent used to be 
accurately known or known to be constant. 



In InternatjonaJ Patent Application WO84/01Q31. I dis- 
closed that toe concentration of an analyte in a liquid sample 
can be measured by contacting the bquid sample with a 
small amount of binding agent having binding sites specific 
for the analyte. In this method, provided the amountof 
bmding agent is small enough to have only an insignificant 
effect on the concentration of the analyte in the bquid 
sample, it is found that the fractional occupancy of the 
binding sites on the binding agent by the analyte is effec- 
lively independent of the volume of the sample. 

This approach is further refined in EP304.202 which 
discloses that the sensitivity and ease of development of the 
assays in WOS4AJ1031 is improved by using an amount of 
bindmg agent less than 0.1 V/K moles located on a small area 
(or "microspoO of a solid support, where V is the volume 
of the sample and K is the equilibrium constant of the 
binding agent for the analyte. 

Id WO93/08472, 1 disclosed a method of further improv. 
rng the sensitivity of binding assays by immobilising small 
amounts of binding agent at high density on a support in the 
form of a microspot. In this assay, a developing agent 
comprising a microsphere containing a marker, e.g a fluo- 
rescent dye, is used to back-titrate the binding agent after it 
has been contacted with the liquid sample containing the 
analyte As the microsphere can contain a Urge number of 
molecules of fluorescent dye, the sensitivity of the assay is 
improved as the signal from small amounts of analyte can be 
amplified. This amplification permits sensitive assays to be 
earned out even with microspots having an area of 1 mm 2 
or less and a surface density of binding agent in the ranee of 
1000 to 100000 molecules^ 2 . * 

SUMMARY OF THE INVENTION 
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The present invention provides a method, device and test 
kit for carrying out a binding assay in which binding agent 
having binding sites specific for a given analyte in a liquid 
sample is immobilised in a test zone on a solid support, the 
binding agent being divided into an array of spatially 
40 separated locations in the test zone, wherein the concentra- 
tion of the analyte is obtained by integrating the signal from 
the locations in the array. 

Accordingly, in one aspect, the present invention provides 
a method for determining the concentration of an analyte in 
a liquid sample comprising: 

(a) locating binding agent having binding sites specific for 
the analyte in a lest zone on a solid support, the binding 
agent being divided into an array of spatiaUy separated 
locations; r 

(b) contacting the support with the liquid sample so that 
a fraction of the binding sites at each location become 
occupied by analyte; 

(c) measuring a value of a signal representative of the 
fraction of the binding sites occupied by the analyte for 
each individual location in the array; 

(d) integrating the signal value obtained for each location 
w the array to provide an integrated signal; and, 

(e) comparing the integrated signal to corresponding 
values, obtained from a series of standard solutions 
containing known concentrations of analyte, to deter- 
mine the concentration of the analyte in the liquid 
sample. ^ 

Thus, in the present invention, the values of the signal 
from an array of locations in the test zone are used to 
determine the concentration of a single analyte. This is in 
contrast to the approach described in EP304.202, in which 
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be viewed sequeniully, e.g. using , eonfocal mi<™^ nwrospot leads to i reduction in toe seosit^iiv tttl 

binding agent in the lest zone cu be viewed togeie? e « ^ ,0Ul coa,ed «• occupied bX 

using a charge coupled device (CCD) camerTwfth <£' tDU "- micros P 0,s . «»d beoce the total amoWtf fabfe? 

ago. vines from each location Wm^ed Sfj? ^ " ^ m 

neously. ^ simulta- In addition, the use of an array of micrtKnoi, r ^. 

ftefer.bly.Aesign.lsrepresem.twcoftbefraaionofthe " ^ ^ ,0 ^^wbeL^ 

«»^'^«^bybi^ igaBlIerth ^^ obtained for ,ny given ^i^, j, u e £ r « 

measured by back-utrating tbe binding agen. with ^ ^ miaOS P ots » «*«n»y. 

opmg^ent having . mtfker> Me deve ^ ^ ' preSem invention provides, device 

capable of b,nd,ng to unoccupied binding sites bound u ! , ^'^^""""^""""f^ornK.re 

analyte or to occupy binding sites in r^Jg* " n 'Z M ° ,P,e ' ,hC deWce ■ sol^S 

noncompetitive method, as described lbove ^ £™ 8 ^ 0r . more "»«. each test zone S 

isotope, an enzyme, a cbemiluminesceni marker or a fluo £5- ^ °Z 3 gIVcn awlvle in » ^"id sample the 
rescen. marker. Tbe use of fluorescent dye m akers ° ,n * b " D8 divided in '° " «™v of spaUaUv 

° ^ fl T^ CeD0C 05 U WP*« »lour Sgt Sesf^«V, ^ * ° b,iiDed b * ^8»tingTg£l 

(excitation and emission wavelength) for detection Run vlUues from each locanon in the array. B ^ 

resoent dyes include coumarin, fluorescein, rbodamine and h „ 3 l^ 6 ' "P 6 ", «* P™*"" invention provides a kit for 

Texas Red. Fluorescent dye molecules taving^XS 25 EST*. «* one or more an"ytes in . 
fluorescent periods can be used, thereby allow^g S "* ta """Prising: ^ " * 

resolved fluorescence to be used to measure tbe streneth „f (a) 4 deviee comprising a solid support h.vino „„. 

II J^° CWU ? , eiCb n,oIecule ^ve'loping «en, SepMa,ed teca "' ons in <°< *« »ne° «d P * 
ampwying ue signal produced by tbe develop (b) one or more develonino »o*m f r . . 

Preferably, the ocauons .re microspots and "be asLv is £raclioD °f «e bindCV.w of SL^S?" ,U, « ,be 

n7« y r u re i e,led - ' • 15 D »i-«n«CTOspots" in the relevln wherein Ibe concentration of , given analvte i. nh, • „ u 
pans of the description that follow. re,eVM ' w integraung signal values from t£ ^LteS ^ iSLS m * 

piu»KT.ni^ «-■««- of. a8enl 41 MCh *• - m r kmolMo ^ 

providing a plurality of test ■S^pSTfiS BRIEF DESCR1PT 10N OF THE DRAWINGS 

association constant for analyte bindino i« ik. u j- ci<- •> 

This ensures that, be "^ZS?^"^'^ 11 - Jl, I""™™ the VP** variation in signal-to-noise 

in WO83/D1031 are fulfilled ^reg2L™?^al v T ,bed nr ° f 3 m,CT0SP °' Cbanges; 

centration. gUat " ° f ,be aMl 3« «»- FIG. 3 represents how diffusion constraints on analvtc 

d^eTcca^^m^ " ^AST 1 a8 "' " ^ - - ^ 

comparable totbe ^^^SS^XSSS J' 0 4 ^ b ° W ^ ■ »*J-1 —men, 

in the c« of microspots typically prevfd Z £ «?* " ' bC ™ ° f micros P°' cbaoges; 

w ,bou ' 80 a**-^ * jyrtsa * of s?- 5 sbows 8 con,paris ° D be,w « n «-« «» 
3r 0 r Di — - - — 0 ,h,t .! -'S^s siro^r ray of 

ftiMSKt^'.^ DETAILED DESCRIPTION 

— by .e „ of^%- JSJKX;^™^ 
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pbysjco-cbemicil properties (ie association and dissociaUoo 
rue coosuots) of the binding agent, tbe viscosity of the 
analyte containing solution to which the microspot is 
exposed, the specific activity of the label used, etc 

In all .be figures, value A denotes tbe area of a microspot 
typically used in tbe prior an (typically l mm 2 ) ] D iU T e 
figures, the density of binding agent is kept constant. 

FIG. 1 shows tbe experimentally observed variation of 
sensitivity as tbe area of a microspot is reduced In the 
present context, sensitiviiy can be defined as tbe lower hZi 
of detection which is given by tbe error (s.d) with which 
is possible to measure zero signal. As FIG. 1 shows, as Uie 
area is reduced from value A, tbe sensitiviiy of the bindin* 
assay reaches a maximum and then declines as tbe are. of 
tbe microspot is further reduced towards zero. 

Some of tbe opposing factors leading to this observation 
are depicted in FIGS. 2 to 4. ooservauoo 

FIG. 2 shows bow the signal-to-noise ratio associated 
with fce measurement of the occupancy of tbe binding sites 
of tbe binding agent changes as the size of the microspot 
towards assuming equilibrium has been 
reached. As microspot area is reduced from value A. the 
fractional occupancy of the binding sites of the buidine 
agent reaches a platea* lvalue as the concentration of bindinf 
agent falls below 0 Ol/K. Therefore, .be signal per uni, are! 
from markers on developmg agen. used .o measure the 
occupancy of tbe binding sites by analyie will also reach a 
Bateau. As the background noise per uni, area rernauj 
approximately constant, so tbe signal-to-noise ratio vriU 
hkewue increase to a plateau value as the concentration of 
binding agent falls below 0.01/K. 

FIG. 3 shows bow diffusion constraints change as tbe are. 
of a microspot is reduced. "Diffusion constraint!" restrict , be 
rate at which analyte migrates towards and binds to .be 
binding agent. As FIG. 3 shows, tbe diffusion constant 
decrease as microspot size decreases, ie tbe kinetics of tbe 
binding process are faster for smaller microspots, imply™ 

more^pTd? MmiC fa ,be ^ * 

fxi?° ' ^f™'" fcvel - tbis Phenomenon can be pic.ured as 
follows. When a microspot containing binding agen. " 

£ ^ S ! m P le . CODUiDin E «»ly«. 'be binding 

agent binds analyte depleung the local concentration of £f 
analyte as compared to ihe liquid sample as a whole This 
leads to a concentration gradient being estabbsbed in .be 
v«anity of the microspot until thermodynamic equilibrium is . 
reached. Th* process is found to be slower for Zc< 
microspots the diffusion constraint being approximated 
proporuonal to microspot radius. When (be !ccu?ancTof fhe 

r^ g . S,,e .l 0n "* bmding 48601 has »«bed an equilib! 
rmm value, the concentration of analyte in tbe liquid sample 
is uniform However, equilibrium is reached more rapidW D 
the case of microspots of smaller size, implying that, for any 
mcubauon time less «h. D ui,, reauired J*,* equilS 
m tbe case of tbe larger spot, the fracuonal occupancy of tbe 
binding sites on the smaller spot is greater. 

However, as microspot area decreases, so the amount of 
bmdmg agent and the level of signal from developing agen 
will likewise decrease. This leads to an increase in the 
suusual errors k, the measuremen. of the signs* f onf 
marker on a developing ,ge„,, which tend to infinity as tbe 
microspot area tends to zero (see FIG. 4). 

It can be seen that a consideration of the signal-to-noise 
ratio and diffusion constraints indicate an increase in thl 

ESS V biodin8 r ay ," ,be — «* • 3£££ 

decreased. However, these factors are opposed bV an 
increase in tbe statistical error of signal measurement as tbe 
microspot area decreases These factors combine to produce 
the observed variation of sensitivity with microspot area 
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shown in FIG. 1. Thus, the overall coosequence sin*. » 

Effug* * uto - fc ays; A: 

hi^ZT'' il " desirablc 10 devel °P ■«■»« inimaruriaed 
bmdmg assays usmg microspots of tbe smallest poJbT&t 
containing vanishing* small amounts of bindmg^enL tba! 
haveupid kineucs to minimise the time takenl ca^S 

hin^n/' eSem i0V ! n " 011 proves sensitivity and reduces 
bmdmg assay mcubation times by exploiting the contr.dk? 

done by subdividing tbe total amount of binding weminto 
an array of spaually separated locationsluch If "mEi? 
microspots". to reduce diffusion constraints, and integrX 
the signals representative of the fractional occupancy of 
binding agen, at each location to obtain a Wttl 3K£ 
than would have been achieved by using a single nSS 
equal in area to the total area occupied by the S 
crospots comprismg the minimicrospot array 

This implies, inter aha, that tbe total amount of bindine 
agent used can be made even smaller than in p2r« 
where a ba lance between kineucs and signal -to-noSTreU- 
tive to suusucal errors had to be made ^aSSTiS. 
Uvtty Tbe present invention therefore car^Cprovf^ 

bound to binding agent, whilst reducing the dtfriisio. TccV 
strain* associated with each microfpot in ^.nay 
,he mcreasin 8 statistical errors observed £ ,£ 
prior an as microspo. sue is reduced are obviaied, as the 

X EFT", b0m ** occ, » ied bindin * s «« «»y antlS 
m .be .ndiv.dual microspot is integrated over fce arraTto 
provde an integrated signal, thereby retaining me™!." 
measuremen. advantage observed for larger mfcrospoT 

J?£* ,U "T lei b ° W 8 sia8,e ^"ospoi of tbe prior art 
can be d.v.ded in.o an array of 25 microspots conlamone an 
equivaleni loia] amoun. of binding agent maamui S «> 

in?»*"l MCSS : 0Uler maa &™** or geometries of bind- 
e^S P T ^ " SMyS yieldk8 <be same benefit can 
engaged, see for instance FIG. 6 which shows bindine 
agen, .mmobibsed as lines forming a grid (see the 3 
t h^-tt C0Dfi S uratl0n bas tbe effect of reducing 

the diffusion cons,rainis whilst maintaimng the total «ef 
coa,ed w,tb binding agen. (e.g. an antibody) TobSe™ 
mcreasnig s.a.tncal errors and associated toss of Sv£ 
observed as ,be amoun, of binding agent is reduced 

Tbe amoun. and distribu.ion of .be bindine aeent in iht 
ocauons comprising the array depends o7a & Z few of 
facors including ,be diffusion cbaracTeristics of tbe anaWte 

n a,v amre h °k VUC0Si,y ° f W »*P* »nuin" g the 
analy,e and .be pro.ocol used during incubation. However 
g»yen .he guidance here .he skilled person car? r3 
"P^imemally or by computer modelling! 

zzT^r gmtD> °' eeomciry ° r array r ° r a ° y « iv « 

EXAMPLE 

Conjuga.ion of An.i-TSH (Anti-Tbyroid 
Sumulatmg Hormone) Mouse Monoclonal 
Aniibody to Fluorescem Hydropbilic Latex 
Microspheres 

0 ^ 0 ■fi M ? D, b y dr °P bi ^ »»'« microspheres in 

TWEEN 20 «i d f ,S " lled W4,ef WMe added 10 °- 5 °f 1« 
TWEEN 20, surface-aciive agent, shaken for 15 mio a. room 

icmperaiure and centrifuge* at 8« C. for 10 min at 20 «» 

rpm in a MSE High-Spin 20 ul.racentrifuge. 

« rki m« T 4 * dis P ereed » 2 ml of 0.05M MES 
(2-[N-Morphohno] etbanesulfonic acid) buffer, pH6 1 and 
centruuged. 
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3. Step 2 was repealed 

5 Jtr l ! Cl ?™f Ptned *° M ml MES buffer 
mio « room JScr.ru!? """"P 1 *** «« ^alceo for IS 

shaken for 2 hours „ roo„ to ,De ««m and 

centrifuged e ° ,0rlbouri "«»»"eoper a « U reand 

11. Slep 10 was repeated twice 

12. Tte pellei was dispersed in 2 ml of 
ing 0.1 sodium azide andstored a. 4« C BSAcomaul - 

(^mparfaon of Kinetics of Micro Versus Mini- 
(Thyro«J Sumuladng Hormone) Assay 

MicroFluor Microti,* weiu DyDa ' ecb black 

in.media.ely, ,be wells^^ We " 

Pierce for 30 min al roonTT^l SupcrBiock ^om 

0.1M pbospb„e buffer' pmr^ r Ure aDd Wasbed wilb 

m.te.y 100 pi picoli.er LTL^ tff^^'' 

coaled amibody density ST! * abovc " 
are es.im.ttd .o be 2xlf? IgS^ minj ^'crospo ls 

3. 200/il of plasma comaming 1 «U/ml nfrcu 
io .A me micro.i.re wells and shaken " ™ 
At 30, 60. 120 min and 18 h™,!, , 100113 '^peramre. 
con^ningmemicrosp^ai^i^ 11 '). fou ' ~* 
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containing the microspoTs and {0Vem & hl >> fou ' w 'Us 

tibody arc deemed us.ng a plurali.y of arrays on S * ~ P . ' 



incubated"^ 2W ^ tfSSw 7 7^ 20 ' 
conjugal ,o bydropbilie l»"x SL£! P1D8 
buffer (50 ^ml) for 30 ,? ^ ^Jl* " TriS - HC1 
wasbed wiibpbospbaUi.TWEOJ 20 ^3? ,e ° perature and 
Iben scanned wi.h , UmvSuS. f f ^ WeUS WWe 
equipedwi.b.oArgon^'oT^r Df0Cal """^ 
Results 



faster ki De ,ic for me association «f . n 1 ln, - nucrBS PO«s bave 
antibody, , od could ^S^SS^^ *«*«■■ 
The invention claimed is "cubauon umes. 

fa °' a ' 

each analyte, Uk steps ^ P ' ^ ""^ COn P risiD 6. for 

less .ban 0.1 V/K mol« Ik " P rMeDI 10 " amouot 
liquid samp e ana" K 1 ^ V * V ° luiDe of «>* 

agent, and wherein saT L^u^ ° ,0dil * 
divided into an arrav of „J?^, D10dln 8 ageni is 

w -uc^ r^s^-r ioca,ions; 

fraction of the bmlol si^ of L P U e W Ih2, a 

labelled developing i P n r ^- n a " g ln n>arkersucb lha ' '»e 

si-es, to spec/calfyt" 5'^ yro^ h b " di08 
sues w, M spedficuv « 10 & b.nchng 

( ^cKi^ ' d -,o pul g agen, 

io obuiL a value wb'ch t ™ loc ™°<** Ibe array 
binding si.es cxclj2t v ^eT" 1 5 , ,he ffaC,i ° D ° f ,be 
(e) adding .be values obTafced ., . ^ " l0Ca,i ° n; 
«o provide a total s.gnal ^d l0CaU ° DS " ' be aira y 

(0 -K^aS. o&ard 0 "^^ 8 

concen.ra.ion of (he analv^ ^in ^ y '°j le,emu ' e «»>e 
2. Tbe method acco rdinS %Z 'IT" 3 Sa * nple - 

"ion of a P lurab,v of d2 ,1 l^.}' Wbereui . lbe »«o- 
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timet (mim) 



30 
60 
120 

Ovtmight 



Microtpot Mint-micretpo, 



65x1 
118 > 35 
141 « 21 



111 « 13 
J«9* 16 
J78, j 6 
19] ,23 



CodcJusjod 

Significantly higher mean respooses were oh,. ^ 
betweeo 30a Dd 120 m ins io the i^itS 
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are de.ermined using a Zraij ' ' ^ * ^ ^ s «"Ple 

specific bindmg agen. is aD 8 ,n,°bod v a nd .^"^ 0) ,be 
aotigen or (ii) lhe % ecific ^^'^^^ "« 
ot.de and tbe analy.e is a nucleic acid ol '6°™cle- 

eacb analyie. .be steps of: d coo P"s«ig. for 

(a) immobilizing a specific binding aeent incl u rf,n„ u- ^ 
»ng sues specific for the analvL ^ mc ™™& bu,d- 
wherein ,he specific binding S^t 
•he conceniraiion of.be anaJvie i 1^ d e'erm.ne 
less tban 0.1 V/K moles wh^re V k ,h ^ a^l0U, " 
liquid sample and K i^T VOiuDJe of ,be 

analy,e s^cifiSy bi d ogTr"^'^" ,he 
agen., and wbereL saTd fpecifi e b L'^' 

specif for me aoalyie sp8e -s ai 0 ^ ^ : agen, 

(c) cooiactjDg (he support with / h„ t . aDa,yle « 
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labelled developing agent binds to unoccupied bindine 
sites, to specificity bound analyte or to ibe binding 
sites with specifically bound analyte; 

(d) sepamiog non^pecificaOy bound developing «gent 
from toe sobd support and measuring the signal pro- 
duced by tbe marker at eacb of the locations in tbe array 
to obtain a value which represents Ibe fraction of Ibe 
binding sites occupied by tbe analyte at eacb location- 
tod 

(e) adding tbe values obtained at tbe locations in Ibe array 
to provide a total signal which indicates tbe concen- 
tration of tbe analyte io tbe liquid sample 

7 The method according to claim 6 wherein tbe specific 
bmdmg agent is divided into between 4 and 40 locations 

8. The method according to claim 6, wberein tbe locations 
•re in an area of about 10000 m \ ue locations being 
separated from eacb other by a distance of 100 to 1000 taa 

9. The method according to claim 6, wberein the concen- 
trations of a plurality of different analyies in tbe liquid 
sample are determined using a plurality of arrays on said 
support. 

10 The method according to claim «, wherein (i) the 
specific binding agent is an antibody and the analyte is an 
antigen or (ii) the specific binding agent is an oligonude- 
otide and the analyte is a nucleic acid. 

11. A method for determining tbe concentration of at least 
one analyte in a liquid sample, said method employing a 
solid support on which is immobilized, for each analyte a 
specific binding agent including binding sites specific for the 
anaiyte. wberein tbe specific binding agent used to deter- 

J"** "™ ati °, D of tbe ""'y* ■ P««ot in an amount 
less than 0.1 V/K motes, where V is the volume of the IiquM 

""■P* » od £ * lhe "wcuiion constant for tbe analyte 
specifically binding to tbe specific binding agent, and 
wherein said specific binding agent is divided into an array 
M^ot Sepan " locations, said method comprising the 

(a) i contacting the support with the sample so that a 
fraction of tbe binding sites of the specific binding 
agent specific for the analyte specifically binds the 
anaryte; 

(b) contacting the support with a developing agent 
labelled with a signal-producing marker such that the 
labelled developing agent binds to unoccupied binding 
sites, to specifically bound anaiyie or to tbe binding 
sites with specifically bound analyte; 

(c) separating non-spedfically bound developing agent 
from the sobd support and measuring tbe signal Jro- 
duced by tbe marker at eacb of the locations in tbe arrav 
to obtain a value which represents tbe fraction of the 
bindiog sues occupied by tbe analyte at eacb location- 

(d) adding tbe values obtained at tbe locations in tbe array 
to provide a total signal; and 

(e) comparing tbe total signal to corresponding values " 
obtained from a series of standard solutions containing 
known concentrations of the analyte, lo determine the 
concentration of tbe analyte in tbe liquid sample 

12. The method according to claim 11, wherein (be 
specific bmdmg agent is divided into between 4 and 40 
locations. 
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13. TTjc method according to cUizs 11 whe»» ik*> 
£» U0BS b ' ve » «• of .bom i(gS%£ 
being separated from eacb other by a distance flOOtolMO 

14. The method according to claim 11. wberein the 
eventrations of a plurality^ different ana£es u £ 

saTdlu^? M detenMDKl "*« ' phmJi * of °» 

s ~L7^J? ttbai """"^ » claim 11, wberein tbe 
specific binding agent is an antibody and tbe analvte is an 

dDllgCO. * 

s Jrifir?L me,h0d *^ cordin 8 10 claim 11. wherein tbe 
ffSEa*" * " and me analyte 

fralLn ror . deler ?«»8 • value representative of a 

fraction of binding sites of a specific binding agent mcluding 
binding s,tes*pecific for an analyte wbictbinding ste ~£ 
ocxupied by the analyte present a 
method comprising tbe steps of: 

(a) immobilizing , be specific binding agent on a solid 
support, wherein the specific binding agent used for tbe 
fractional occupancy determination 15 present in an 
amount less than 0.1 V/K moles, where V is tbe volume 
of the liquid sample and K is the association constant 
for the analyte specifically binding to tbe specific 
binding agent, and wberein said specific bmdmg^gent 
is d.v,ded into an array of spatially separated locations- 

(b) contacting the support with the liquid sample so that 
a fraction of tbe binding sites of the binding agent 
specific for the analyte specifically bind tbe analyte- 

i ^T B \ ,he SUppon with * developing agent 
labe ed with a signal-producing marker such lhat the 
labelled developing agent binds lo unoccupied binding 
sites, to specifically bound analyte or lo tbe binding 
sites with specifically bound analyte; 

(d) separating non-specifically bound developing agent 
from the solid support and measuring the signal pro- 
duced by tbe marker at each of tbe locations in the array 
to obtain ; a value which represents the fraction of the 
bindmg snes occupied by the analyte at each location- 
and ' 

(e) adding the values obtained at the locations in the array 
to provide a total signal which indicates the fraction of 
iy^ST " ^^"^agen, occupied 

.n^fiJ!* ? Clb ° d accordin 8 10 claim 17. wberein tbe 
tocuioas & a8Cnl ' S d ' V ' ded ' Dl ° between 4 »nd 40 

19. The method according to claim 17, wberein Ibe 
locations are in an area of about 10000 an*, the locations 
6«ng separated from eacb other by a disunce of 100 to 1000 

20. The method according lo claim 17, wberein the 
fracnon of occupied binding sites is determined for a plu- 
rauty of different analyies in the liquid sample using a 
plurality of arrays on said support. 8 

21 Tbe method according to claim 17. wberein (i) tbe 
specfic binding ^agenl is an antibody and the anaiyie is „ 

«Tl*,i U) ^ SptCifiC biDdi ° 8 '8 em » » oli^onucle- 
oiide and tbe anaiyie is a nucleic acid. 
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_ xn ^phe UNITED STATES PATENT Alto TRADEMARK OFFICE 

DECLARATION OF JOHN C. ROCKETT, Ph.D. 
UNDER 37 C.F.R. § 1.132 

I, JOHN COUGHLIN ROCKETT III, Ph.D., declare and 
state as follows: 

1. Since 1995 I have been engaged full-time in 
molecular toxicology research, with an emphasis on the 
application of expression profiling techniques, including but 
not limited to nucleic acid microarray expression profiling 
techniques, to studies of the mechanisms of toxicant action 
and to the design of assays to monitor toxicant exposure. 

2. My curriculum vitae, including my list of 
publications, is attached hereto as Exhibit A. 

3. For the past 5 years, my work has focused 
primarily on analyzing the effects of potentially hazardous 
environmental agents, such as heat, water disinfectant 
byproducts, and conazole fungicides on the male reproductive 
tract. Although we are interested in the basic mechanisms of 
action of such toxicants, we also have two practical goals in 
mind: first, to identify individual agents and families of 
agents that adversely affect male reproductive development and 
function, and second, to develop methods for monitoring human 
exposure to such agents, particularly methods capable of 
identifying toxicant exposure at an early stage. 

4. I have relied on expression profiling as a 
principal approach to these goals. Expression profiling, by 



— r porting the expr ess ion levels of thousands of genes 
simultaneously, gives us an opportunity to identify and group 
toxicants based on similarities in the patterns of gene 
expression they induce in cells and tissues; the gene 
expression profiles induced by treatment with known testicular 
toxins serve as standards, molecular signatures or molecular 
fingerprints as it were, against which the patterns of gene 
expression induced by agents of unknown toxicity may be 
compared and judged. In addition, gene expression profiling 
may give us the opportunity to detect toxicity before more 
gross phenotypic changes become manifest. 

5. In keeping with this research emphasis, I have 
until recently: 

served on the Microarray Technical 
Subcommittee of the United States Environmental 
Protection Agency (EPA) Genomics Task Force, and 

served on the Scientific Committee for 
the conference series on "Critical Assessment of 
Techniques for Microarray Data Analysis," held 
annually at Duke University, Durham, NC; 

and I currently 



serve on the Technical Committee on the 
Application of Genomics to Mechanism-Based Risk 
Assessment of the International Life Sciences 
Institute's Health and Environmental Sciences 
Institute, 

serve on the Genomics and Proteomics 
Committee of the National Health and Environmental 
Effects Research Laboratory of the EPA's Office of 
Research and Development, 

belong to the [North Carolina Research] 
Triangle Array Users Group, 



b long to the Molecular Biology 

Speciality Section of the Society of Toxicology, 
and ' 

belong to the Triangle Consortium for 
Reproductive Biology. 

In addition, I am the principal investigator on a cooperative 
research and development agreement (CRADA) entitled 
"Development of a Genetic Test for Male Factor Infertility. ■ 
Prior to this, I was a co-principal investigator on a 
materials cooperative research and development agreement 
(MCRADA) to print oligonucleotide-based microarrays; and from 
1999 - 2002, I was coinvestigator on a CRADA to develop gene 
microarrays for toxicology applications. 

6. I presume the reader's familiarity with the 
basic construction and operation of microarrays. For purposes 
of the discussion to follow, I use the phrase "nucleic acid 
microarray" and, equivalently, the term "microarray" to refer 
generical'ly to the various types of nucleic acid microarray 
that include immobilized nucleic acid probes of sufficient 
length to permit specific binding, with minimal cross- 
hybridization, to the probe's cognate transcript, whether the 
transcript is in the form of RNA or DNA. Although this 
definition excludes microarrays having shorter probes, such as 
the 20-mer probes of arrays manufactured by Affymetrix, Inc., 
many of the comments that follow nonetheless apply to such 
microarrays as well. 

7. Although my own work with microarrays dates 
back only to 1998, and high density spotted nucleic acid 
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microarrays "themselves "da te back perhaps only to- 199 5, 1 
microarrays are by no means the only, nor the first, 
expression profiling tool. As I describe in detail in my 
Xenobiotica review, * there are a number of other differential 
expression analysis technologies that precede the development 
of microarrays, some by decades, and that have been applied to 
drug metabolism and toxicology research, including: 
(1) differential screening; (2) subtractive hybridization, 
including variants such as chemical cross-linking subtraction, 
suppression-PCR subtractive hybridization and representational 
difference analysis; (3) differential display; (4) restriction 
endonuclease facilitated analyses, including serial analysis 
of gene expression (SAGE) and gene expression fingerprinting; 
and (5) EST analysis. 

8. In my own earlier research, I used both 
reverse-transcriptase polymerase chain reaction (RT-PCR) and 
suppression-PCR subtractive hybridization (SSH) to study 
patterns of differential gene expression caused by hepatic 
challenge with nongenotoxic and genotoxic hepatotoxins . 3 



1 Schena et al., "Quantitative monitoring of gene expression patterns 

hereto" a?££X"? " aiCrOWW ' " Scie " Ce 270:467-470 (1995, , ^tta^d 

* Rockett et al. ( "Differential gene expression in drug metabolism and 
^SS? 1 ?S y: ? ra " icalities ' Problems and potential," Xenobiotica 29 : 655^91 
(1999) (hereinafter, "*enoJbiotica review"), attached hereto as Exhibit C 

See, e.g., Rockett et al., "Molecular profiling of non-genotoxic 
carcinogenesis using differential display reverse transcription polymerase 

22(4). 329-33 (1997), and Rockett et al., "Use of a suppression-PCR 

iSSS'iSSrS? ridi " tion method to identify gene species which demonstrate 
altered expression m male rat and guinea pig livers following 3-day 
exposure to I4-chloro-6-(2.3-xylidino)-2-pyrimidinylthio] acetic acid • 
Ttoxicology 144(1-3, .-13-29 (2000). attached hereto respectively as Exhibits 



9. These older transcript expression-profiling 
techniques provide analogous expression data, but with far 
lower throughput. 



10. It has been well-established, at least since 
the introduction of high density spotted microarrays in 199! 
that: 

(i) each probe on the microarray, with 
careful design and sufficient length, and with 
sufficiently stringent hybridization and wash 
conditions, binds specifically and with minimal 
cross-hybridization, to the probe's cognate 
transcript; 

(ii) each additional probe makes an 
additional transcript newly detectable by the 
microarray, increasing the detection range, and 
thus versatility, of this analytical device for 
gene expression prof i ling ;* 

(iii) it is not necessary that the 
biological function be known in order for the gene 



The compelling logic of this proposition has likely motivated the 
remarkably rapid progress from the earliest high density spotted arrays in 
1995 (Schena et al.. -Quantitative monitoring of gene expression patterns 
with a complementary DNA microarray," Science 270:467-470 (1995), attached 
hereto as Exhibit B) , to the first whole genome arrays in 1997 (Lashkari et 
al., "Yeast microarrays for genome wide parallel genetic and gene 
expression analysis.- Proc. Natl. Acad. Sci . USA 94 (24 ): 13057-62 (1997) and 
DeRisi et al., "Exploring the metabolic and genetic control of gene 
expression on a genomic scale," Science 278 (5338) : 680-6 (1997) attached 
hereto as Exhibits F and G. respectively) , to the concurrent announcement 
by two companies earlier this month of their respective commercial 
introductions of single chip human whole genome arrays (Pollack. "Human 
Genome Placed on Chip; Biotech Rivals Put it Up for Sale," The New York 
Times. Thursday. October 2. 2003 (Business Day), attached hereto as 
Exhibit H; -Agilent Technologies ships whole human genome on single 
microarray to gene expression customers for evaluation, " Press Release 
Agilent Technologies. October 2. 2003, attached hereto as Exhibit I; 
•Af fymetrix Announces Commercial Launch of Single Array for Human Genome 
Expression Analysis; More Than 1 Million Probes Analyze Expression Levels 
of Nearly 50,000 RNA Transcripts and Variants on a Single Array the Size of 
a Thumbnail,- Press Release, Affymetrix, October 2, 2003. attached hereto 
as Exhibit J) . 



or a fragment of the gene, to prove useful -as a 
prob on a microarray to be used for expression 
analysis; 

(iv) failure of a probe to detect changes 
in expression of its cognate gene does not diminish 
the usefulness of the probe on the microarray; and 

(iv) failure of a probe to detect a 
particular transcript in any single experiment does 
not deprive the probe of usefulness to the 
community of users who would use this research 
tool . 

These principles also apply to transcript expression profiling 
techniques that antedate the development of high density 
spotted microarrays, and accordingly were well-understood 
prior to 1995. 

11. Moreover, expression profiling is not limited 
to the measurement of mRNA transcript levels. It is widely 
understood among molecular and cellular biologists that 
protein expression levels provide complementary profiles for 
any given cell and cellular state. Although I cannot claim 
credit for having coined the phrase, I have written that the 
difference between transcript expression profiling and protein 
expression profiling is that " transcriptomics indicates what 
should happen, and proteomics shows what is happening. " 5 

12. For decades, such protein expression profiles 
have been generated using two dimensional polyacrylamide gel 



Rockett, "Macroresults through Microarrays, ■ Drug Discovery Today 
7:804 - 805 (2002) (emphasis added), attached hereto as Exhibit K. 



electrophoresis (2D- PAGE) , and used, among other-things, to 
study drug effects.' 

13 . Although the protein expression profiles 
produced by 2D- PAGE analysis are analogous to the transcript 
expression profiles provided by nucleic acid microarrays, an 
even closer analogy is perhaps offered by antibody 
microarrays; as I note in my Drug Discovery Today commentary, 
such antibody microarrays date back to the work of Roger Ekins 
in the mid- to late-1980s. 7 

14. The principles in paragraph 10 also apply to 
protein expression profiling analyses, particularly to 
analyses performed using antibody microarrays. Thus, as with 
nucleic acid microarrays, the greater the number of proteins 
detectable, the greater the power of the technique; the 
absence or failure of a protein to change in expression levels 
does not diminish the usefulness of the method; and prior 
knowledge of the biological function of the protein is not 
required. As applied to protein expression profiling, these 
principles have been well understood since at least as early 
as the 1980s. 

15. Both gene and protein expression profiling are 
particularly useful to the toxicologist , especially in the 
pharmaceutical industry. Accordingly, I made the following 



See, e.g., Anderson et al., "A two-dimensional gel database of ra 
liver proteins useful in gene regulation and drug effects studies • 
Electrophoresis 12:907 - 930 (1991), attached hereto as Exhibit L. 

7 See Ekins et al., J. Bioluminescence Chemi luminescence 5:59-78 
(1989); Ekins et al., Clin. Chem. 37: 1955-1965 (1991); and Ekins. U.S 
Patent Nos. 5.432,099, 5,807,755, and 5,837,551, attached hereto 
respectively as Exhibits M to Q. 



statements Tn my Xehobi'otica review, written in_th summer of 
1998: 

[I]n the field of chemical -induced 
toxicity, it is now becoming increasingly obvious 
that most adverse reactions to drugs and chemicals 
are the result of multiple gene regulation, some of 
which are causal and some of which are casually- 
related to the toxicological phenomenon per se. . 
This observation has led to an upsurge in interest 
in gene-profiling technologies which differentiate 
between the control and toxin-treated gene pools in 
target tissues and is, therefore, of value in 
rationalizing the molecular mechanisms of 
xenobiotic-induced toxicity. 

Knowledge of toxin- dependent gene 
regulation in target tissues is not solely an 
academic pursuit as much interest has been 
generated in the pharmaceutical industry to harness 
this technology in the early identification of 
toxic drug candidates, thereby shortening the 
developmental process and contributing 
substantially to the safety assessment of new 
drugs . 

For example, if the gene profile in 
response to say a testicular toxin that has been 
well-characterized in vivo could be determined in 
the testis, then this profile would be 
representative of all new drug candidates which act 
via this specific molecular mechanism of toxicity, 
thereby providing a useful and coherent approach to 
the early detection of such toxicants. 

Whereas it would be informative to know 
the identity and functionality of all genes up/down 
regulated by such toxicants, this would appear a 
longer term goal, as the majority of human genes 
have not yet been sequenced, far less their 
functionality determined. However, the current use 
of gene profiling yields a pattern of gene changes 
for a xenobiotic of unknown toxicity which may be 
matched to that of well-characterized toxins, thus 
alerting the toxicologist to possible in vivo 
similarities between the unknown and the 
standard. . . . 



Despite the developm nt of multiple 
technological advances which have recently brought 
the field of gene expression profiling to the 
forefront of molecular analysis, recognition of the 
importance of differential gene expression and 
characterization of differentially expressed genes 
has existed for many years. 



16. As noted in the preceding excerpt from my 
Xenobiotica review, expression profiling in toxicology studies 
yield patterns of changes that are characteristic of an agent 
of unknown toxicity, which patterns may usefully be matched to 
those of well-characterized toxins. 

17. In the context of such patterns of gene 
expression, each additional gene-specific probe provides an 
additional signal that could not otherwise have been detected, 
giving a more comprehensive, robust, higher resolution — and 
thus more useful — pattern than otherwise would have been 
possible.* 

18. It is my opinion, therefore, based on the state 
of the art in toxicology at least since the mid-1990s — and 
as regards protein profiling, even earlier -- that disclosure 
of the sequence of a new gene or protein, with or without 
knowledge of its biological function, would have been 



8 In a sense, each gene-specific probe used in such an analysis is 
analogous to a different one of the many parts of an engine, with each 
individual part, or subcombinations of such parts, deriving at least p 
of their usefulness from the utility of the completed combination, the 
functioning engine. 



sufficient information for a toxicologist to use-th gene 
and/or protein in expression profiling studies in toxicology. 



19. The statements made in this declaration 



represent my individual views and are not intended to 
represent the opinion of my employer, the United States 
Environmental Protection Agency, or of any other branch of the 
federal government. Other than my current engagement to 
provide this declaration, I have neither had, nor currently 
have, financial ties to, or financial interest in, Incyte 
Corporation. I am not myself an inventor on any patent 
application claiming a gene or gene fragment. 



herein of my own knowledge are true and that all statements 
made on information and belief are believed to be true, and 
further that these statements were made with the knowledge 
that willful false statements and the like so made are 
punishable by fine or imprisonment , or both, under 
S ction 1001 of Title 18 of the United States Code and may 
jeopardize the validity of any patent application in which 
this declaration is filed or any patent that issues thereon. 



20. 



I declare further that all statements made 





Date 
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CURRICULUM V1TAE _ 



PERSONAL DETAILS 



Name: 
Nationality: 
Work Address: 



John Coughlin Rockett HI 
USA 

United States Environmental Protection Agency 

National Health and Environmental Effects Research Laboratory 

Reproductive Toxicology Division (MD-72) 

Gamete and Early Embryo Biology Branch 

Research Triangle Park 

NC 27711 

USA 



Work Telephone: +001 (919) 541 2678 
Work Fax: +001 (919) 541 4017 

rockett.iohn@.epa.pov 



Work E-mail: 



Employment and Higher Education 

CURRENT POSITION (12/00-present) 
Research Biologist 

Gamete and Early Embryo Biology Branch (MD-72) 
Reproductive Toxicology Division 

National Health and Environmental Effects Research Laboratory 

US Environmental Protection Agency 

Research Triangle Park 

NC 27711 

USA 



PREVIOUS POSITIONS 

8/98-12/00: NHEERL Post-Doctoral Research Fellow, Gamete and Early Embryo Biology 
Branch, Reproductive Toxicology Division, National Health and Environmental Effects 
Research Laboratory, United States Environmental Protection Agency, Research Triangle Park, 
NC, USA. 

Supervisors: Dr Sally P. Darney (Scientific publications under Sally D. Perreault) and Dr David 
J. Dix. 

5/95-7/98: Rhone-Poulenc Post-Doctoral Research Fellow, Molecular Toxicology Group School 
of Biological Sciences, University of Surrey, Guildford, Surrey, England. 
Supervisor: Prof. G. Gordon Gibson. 



EDUCATION 

Ph.D., 1995 - University of Warwick, Coventry, W. Midlands, England 

Title: Transforming Growth Factor-0 and Immune Recognition Molecules in Oesophageal 

Cancer. 

Supervisors: Dr Alan G. Morris (University of Warwick) and Dr S. Jane Darnton (Birminghan 
Heartlands Hospital) 

B.Sc (Hons.), 1991 - University of Warwick, Coventry, W. Midlands, England. 

Degree: Microbiology and Microbial Technology (with intercalated year in industry), Class 2i. 

Tutor: Professor Howard Dalton. 



- - - PROFESSIONAL ACTIVITIES _ 

Membership of Professional Societies: 

Society of Toxicology (Inc. Molecular Biology Speciality Section) (2001 -present) 
Science Advisory Board (2001 -present) 

North Carolina Chapter of the Society of Toxicology (1999-present) 

Triangle Consortium for Reproductive Biology (1999-present) 

Triangle Array Users Group ( 1 999-present) 

Institute of Biology (U.K.) (1989 - present) 

British Toxicology Society (1996 - 2000) 

Biochemical Society (U.K.) (1992-1995) 

British Society for Immunology (1992-1995) 



Membership of Scientific Committees: 



International Life Sciences Institute's (ILSI) Health and Environmental Sciences Institute (HESI) 
Technical Committee on the Application of Genomics to Mechanism-Based Risk Assessment: 

• Steering Committee (5/02-present). 

• Hepatotoxicity Working Group Vice-Chair (5/02-present). 

• Hepatotoxicity Work Group Member (5/0 1 -present). 

Charter member, Fertility and Early Pregnancy Work Group of the National Children's Studv 
(07/01 -Present). 3 

National Health and Environmental Effects Research Laboratory Distinguished Lecture Series 
Committee (July 03-present). 

U.S. Environmental Protection Agency Genomics Task Force Microarray Technical 
Subcommittee (August 03-present). 

National Health and Environmental Effects Research Laboratory Genomics and Proteomics 
Committee (NGPC) (September 03-present). 



Professional Meetings: 

Invited participant ("Observer") in Expert Panel Workshop: "The Role of Environmental Factors 
on the Onset and Progression of Puberty in Children". Organised by Serono Symposia 
International. November 6 -8 th , 2003, Chicago, IL, USA. 

Joint organiser and co-chair of: "Genomic analysis of surrogate tissues for measuring toxic 
exposures and drug action", the "Innovations in Applied Toxicology" Symposium for the Society 
of Toxicology 42 n Annual Meeting, March 9 th -13 th , 2003, Salt Lake City, UT USA 
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(gyjoKrc. Rockett, David J. Esdaile and G Gordon Gibson (1999). Differential gene expression 
in drug metabolism: practicalities, problems and potential. Xenobiotica, 29(7) 655-691 
(7) MC Murphy, CN Brookes, JC Rockett, C Chapman, JA Lovegrove, BJ Gould, JW Wright and 
CM Williams (1999). The quantitation of lipoprotein lipase mRNA in biopsies of human adipose 
tissue, using the polymerase chain reaction, and the effect of increased consumption of n-3 
polyunsaturated fatty acids. European Journal of Clinical Nutrition, 53:441-447. 

(6) JC Rockett, DJ Esdaile and GG Gibson (1997). Molecular profiling of non-genotoxic 
carcinogenesis using differential display reverse transcription polymerase chain reaction (ddRT- 
PCRJ. European Journal of Drug Metabolism & Pharmacokinetics 22(4):329-33. 

(5) Rockett, J., Larkin, K., Damton, S., Morris, A. and Matthews, H. (1 997) Five newly 
established oesophageal carcinoma cell lines: phenotypic and immunological characterisation 
British Journal of Cancer 75(2):258-263 . 

(4) J C Rockett, S J Darnton, J Crocker, H R Matthews and A G Morris (1 996) Lymphocyte 
infiltration m oesophageal carcinoma: lack of correlation with MHC antigens, ICAM-1, and tumour 
stage and grade. Journal of Clinical Pathology 49:264-267. 

2£ C *t 0 ™?£ J u Damton ' J Crocker ' H R Matt *ews and A G Morris (1995). Expression of HL- 

311(1 HLA-DR histocompatability antigens and intercellular adhesion molecule-1 in 
oesophageal carcinoma. Journal of Clinical Pathology 48:539-44. 

(2) Salam M, Rockett J and Morris A (1 995). The prevalence of different human papillomavirus 
types and p53 mutations m laryngeal carcinomas: is there a reciprocal relationship? European 
Journal of Surgical Oncology 21 :290-296. 

(1) Salam M, Rockett J and Morris A (1995). General primer-mediated polymerase chain reaction 
for simultaneous detection and typing of HPV in laryngeal carcinomas. Clinical Otolaryngology 
20:84-88. 



(2) Articles Submitted To A Scientific Journal 

(4) John C Rockett, Judith E. Schmid, Christopher J. Luft, J. Brian Garges, M. Stacey Ricci 
Pasquale Patrizio, Norman B. Hecht and David J. Dix. Gene Expression Patterns Associated with 
Infertility m Rodent and Human Models. * An invited submission* 

(3) Roger Ulrich, John C. Rockett, G. Gordon Gibson and Syril Pettit. Evaluating the Effects of 
Methapyrilene and Clofibrate on Hepatic Gene Expression: A Collaboration Between Laboratories 
and a Comparison of Platform and Analytical Approaches. 

(2) Valerie A Baker, Helen M Harries, Jeffrey F Waring, Roger Jolly, Angus de Souza, Judith E 
Schmid, Hong Ni, Roger Brown, Roger G Ulrich and John C. Rockett. Clofibrate-Induced Gene 
Expression Changes in Rat Liver: A Cross-Laboratory Analysis Using Membrane cDNA Arrays 
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' vl^l^"' SP^ 013 . D avid Dix, Adrian Platts, John C. Rockett Steohen A 

Krawetz Nuclease digestion of spenn chromatin suggests a random ditotoSfSf geS s^uences. 

(3) Articles In Preparation For Submission To A Scientific Journal 

(3) Spearow J, DB Tully, JC Rockett and DJ Dix. Differential testicular gene expression in mouse 
strains sensitive and resistant to endocrine disruption by estrogen. ex P r «sion m mouse 

(2) Sally D. Perrault, John C. Rockett, Laura Fenster James K><m»»r vj^a, d i.u- 

(1) J; Oiristopher Luft, Douglas B. Tully, John C. Rockett, Judith E. Schmid and 
David J. D,x^ Reproductive and genomic effects in testes from mice exposed to the water 
disinfectant byproduct bromochloroacetic acid 

(4) Book Chapters 

(4) John C Rockett. Gene Microarrays Applied to Reproductive Toxicology. In Cunningham 

P^aftion *r dPr ^ A PP[ icati ^ * Sicily Testing, The Human Press, ToZa^T 
Reparation. * An invited submission * 9 AULUWd - Ul 

(3) John C. Rockett and David J Dix. Gene Expression Networks. In Cooper (ed-in-chiefl- 
(2) John C. Rockett. The Future of Toxieogenomics. In Michael Burczynski (ed)- "An 

™ ^rss^s?- Boca ^ Undo - ™- ^ ^ 

oesophageal carcinoma. Peracchia A, Rosati R, Bonavina L, Bona S, Chella B (eds ) Recent 
Advances tn Dtseases of the Esophagus. Bologna: Monduzzi Editore, PP 45-49 (1 996) 

MOther Scientific Publications (Letters to Editors; Meeting Repons; Comment 

(11) John C. Rockett (2003). Probing the nature of microarray-based oligonucleotides Dm* 
Dtscovery Today 8(9):389. (A Letter To The Editor) * An inviL *uhJ!?T* ^ 

(10) John C. Rockett (2003). To confirm or not to confirm (microarray data) - that is the question 
Drug Discovery Today 8(8):343. (A Letter To The Editor) question. 
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(9B) Nazzareno Ballaton, James L. Boyer, and John C. Rockett. (2003), Exploiting Genome Data 
to Understand the Function, Regulation and Evolutionary Origins of Toxicologically Relevant 
Genes. Environ Health Perspect. 1 1 1(6):871-5. (A Meeting Report) 

(9A) Nazzareno Ballatori, James L. Boyer, and John C. Rockett. (2003). Exploiting Genome Data 
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«n which a gene is exprewed provide duet to 
w b»log«l role. The large and expanding 
database of complementary DNA (cDNA) 

*f SSTL 17 <rf , de 1 finin 8 th « Pattern, at 
,.™ r ^ we used the ima II flow. 

P ^'f^™ « Uiflnfl « a model 
organum. Araiidopji, pouesses many ad- 
vantage* for gene expression analysis, in- 
e C i±« * e «« 'hat it has the Lallest 
Senome of any higher eukaryote examined 
" , - L . For ?' ftve cloned Aratidopsi, 

jeo^ce, and 31 expressed sequence tags 
wfo*)" were used I as gene-specific targets. 
We obtamed the EST. by selecting cDNA 

that 28 of the 31 EST* matched sequence 

£^252^ P - °- Brewn - Department el Bocnemsuv ' 
Sttntat) IWwy Maoc Ont», Slsnta^, o 



T^agrtaaoma: Syniart, Psto Wo . CA 94303. usa. 
<To wfiow ccf faspondance should b« aottauad E- 
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cDN As from other organisms served as con- 
trois in the experiments. 

The 48 cDNAs. averaging -1.0 kb 
were amplified with Ae polymerase chain 

vtdual wells of a 96-well microtiter plate. 
Each sample was duplicated in two adja- 
cent wells to allow the reproducibility 'of 
£e arraying and hybridisation proces to 
be tested. Samples from the microtitlr 
plate were printed onto glass microscope 
slides in an area measuring 3.5 mm by 5^5 
m^k"" ° U hi ^P«d arraying 
machine (3) The array, were processed b? 
ehemtcal and heat treatment to attach the 
DNA sequences to the glass surface and 
denture them (3). Three arrays, printed 
m a smgle lot. were used for the expert- 
ments here. A single microtiter plate of 
PCR products provide, sufficient material 
to print at least 500 arrays. 

total Arafcdopw M) by a 

SbS. Vv? lranKri P ti0 « W. The Ara. 
txdopsu mRNA was supplemented with hu- 
man acetylcholine receptor (AChR) mRNA 
at a dilution of 1 : 10.000 (w/w) before cDNA 
Tu ' t0 ^T^ e m mternal «°«danJ for 

labeled cDNA mixture was hybridised to an 
amy at high stringency (6) and scanned 

4«7 



-with a laser (3)rA high-sensiavicy scan gave" 
signals that saturated the detector at nearly 
all of the Axabidopsis target sites (Fig. 1A). 
Calibration relative to the AChR mRNA 
nandard (Fig. 1A) established a sensitivity 
limit of -1:50,000. No detectable hybridisa- 
tion was observed to either the rat glucocor- 
ticoid receptor (fig. IA) or the yeast TRP4 
(Rg. IA) targets even at the highest scan* 
ning sensitivity. A moderate^sensitivity scan 



A High sensitivity 

1 2 3 4 5 G 7 6 9 10 II 12 



of the same array allowed linear detection of 
the more abundant transcripts (Fig. IB). 
Quantitation of both scans revealed a range 
of expression levels spanning three orders of 
mapiinxle for the 45 genes tested (Table 2). 
KNA blots (7) for several genes (Fig. 2) 
corroborated the expression levels measured 
with the microarray to within a factor of 5 
(Table 2). 

Differential gene expression was investi- 



B Moderate sensitivity 

* 1 2 ^ 1 5 6 7 6 9 10 11 « 



□■^ — I O q r 

>l:3.000 1:10.000 1:S0.0D0 >i3o 0 
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P Leal tissue 
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Flfl. 1. Gene expression monftofBd wtth me use o» cOW n*5rutfr<J^!T-!^!T^^^^^^^ 

moOerate sensftMty. (C and 0) A wrote a/rav wns nrr+LH^I' 7 rf^ afIBy «* « (A) but scanned et 
than scar^su«^£^^ ^ ** arra * was 

a^nA^^wMp^bedw^.l:, r^^f^^^^^ZTl?,? ^ * 
tssamme-iabeted cONA from leaf tissue. The single amwiCaXT ^?!L r00t bssue ^ 

fluorescein fluorescence corespondno to n^RN^ p^,^ .^T" 60 succesa ^V to delect .he 
co^pondmg to mRNAsexpreSed^ tea^^ ri roots (E) and the Issam™ n^cence 
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pted with-a simultaneous, two-color hv- 
bnduation scheme, which served to mini 
mue expenmcntal variation inherent in die 
comparison of independent hybridiarion*. 
fluorescent probes were prepared from two 
mKN A sources with the use of reverse tran- 
scriptase in the presence of fluorescein, and 
lusamine-labeled nucleotide analogs re- 
spectively (5). The two probes were "then 
mixed together in equal proportions, hy- . 
bndued to a single array, and scanned sep- 
arately for fluorescein and lissamine emis- 
sion after independent excitation of the two 
fluorophores (3). 

To test wherher overexpression of a sin- 
gle gene could be detected in a pool of total 
Ambidopm mRNA. w* used a n5™*™?5 
analyze a transgenic line overexpressing the 
single transcription factor HAT4 (8). Fluo- 
rescent probes representing mRNA from 
wiM-rypc and HAT^-transgenic plants were 
labeled with fluorescein and lissamine re- 
spectively; the two probes were then mixed 
and hybridised to a single array. An intense 
hybridization signal was observed at the 
position of the HAT* cDNA in the iissa- 
mme-specific scan (Fig. ID), but not in the 
fluorescein-specific scan of the same array 
(Fig. 1C). Calibration with AChR mRNA 
added to the fluorescein and lissamine 
cUNA synthesis reactions at dilutions of 
1:10,000 (Fig. 1C) and 1:100 (Fig. ID) 
respectively, revealed a 50-fold elevation of 
HAT4 mRNA in the transgenic line rela- 

^ C L , t0 . i f 4 4> bundancc in wi W-rype plants 
( I able 2). This magnitude of HAT* over- 
expression matched that inferred from the 
Northern (RNA) analysis within a factor of 
2 (Fig. 2 and Table 2). Expression of all the 
other genes monitored on the array differed 
by less than a factor of 5 between HAT4- 
transgenic and wild-type plants (Fig 1, C 



WHdtyp. 



GAS/ 



ROC1 





1.0 o.i o.oi i.o en 0.0T 

mRNA dig) 



Human 
AChR 



20 2.0 0.2 
mRNA (ng) 

Rg. 2. Gene expression monitored with RNA 
(Northern) blot analysis. Designated amounts of 
mRNA from wild -type and H4 74 -transgenic 
plants were spotted onto nyton membranes and 
probed with the cONAs indicated. Purified human 
AChR mRNA was used tor calibration. 



and D, and Tabic 2). Hybridization of flu. mRNA «,k ■ l ^ , «TOUfflaH 

o»^»i M _i.k.i u i .1 ' . . , 1 w " u mKNA, which was aJJ«i r« l u _i^vi * 



and D, and Tabic 2). Hybridization of flu. 
oracein-labeled glucocorticoid receptor 
cDNA (Fig. 1C) and lissarnineSed 
TRP« cDNA (Fig. ID) verified Ac £2 
ence of the negative control targets and the 
Uck of optical cross talk between the two 
fluorophore*. 

To explore a more complex alteration in 
expression patterns, we performed a second 
two-color hybridization experiment with 
fluorescein- and lissamine-labeled probes 
prepared from root and leaf mRNA, respec. 
ovely. Tkit scanning sensitivities for the 
two fluorophores were normalized by 
matching the signals resulting from AChR 



mRNA, which was added to both cDNA 
synthesis reactions at a dilution of 1:1000 
<Hg. I.E and F). A comparison of the scans 
revealed widespread differences in gene ex. 
pression between root and leaf tissue (Fig. 1. 

lated CAB1 gene was ^500-fold more abuk 
u t**™' 10 * o( 26 other genes 

l he HAT^transgenic line wc examined 
r^rl 0 ^ hw °7* C3ri V lowering, 
Kr^' " d aitCrcd P^nration 
id). Although changes in expression were 
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ES723 
EST29 
GBF-2 
EST3 4 
ES735 
EST41 
rGR 
EST42 
EST45 
HATl 
EST46 
EST49 
HAT2 
HAT4 
EST50 
H475 
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HAT22 

EST52 

ESTS9 
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EST60 

EST69 

PPH1 

EST70 

EST75 

EST 78 

ROC1 

EST82 

ESTB3 

EST84 

EST91 
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Human AChR 
Act* 

^^o^iyorooenase 

Unknown 
Actin 

Chtorophyll a/b btndng 
Phosphoglycerate kinase 
Gtobereflic acid biosynthesis 
Untaown 

G-box binding facta 1 
Bonga&on factor 
Aldotase 
G-box binding factor 2 
Chtoropiast protease 
Unknown 
Cataiase 

Rat gtucocorticojd receptor 

Unknown 

ATPese 

Hc^neobc* -leucine zipper 1 
Ughi harvesting complex 
Unknown 

HofT >«toox leucine zipper 2 
. Haneobox-ieucine upper 4 
Phosphonbutotanase 
H d f nec^-ieuone zeDer5 
Unknown 

Homeoboa-leucine zpper 22 
Oxyoen evolving 
Lhknown 

KootfecMike homeobox 1 
RuBisCO smaA subunrt 
Testation etongaUon factor 
Protein phosphatase 1 
Unknown 

Chtoropiast protease 
Unknown 
Cyctophfiin 
GTP binding 
Unknown 
Lhknown 
Uiknown 
Uiknown 
Syr^tobrevin 
Ugni harvesting complex 
ughl harvesting complex 
Yeast tryptophan biosynthesis 
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227010 
M20016 
U36594f 
T45763 
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L37126 
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X52256 
T04477 
X63895 
R87034 
T14152 
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U09332 
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U09335 
M90394 
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M90416 
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T44621 
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233795 
T45278 
T13832 
R64816 
M90418 



observed for HAT*, Urge change, fa ex 

other 44 gene, we examined. Thia 
»mewhat surprising, particularly becaS 
comparative analysis of leaf and rooTtonl 
.denied 27 dirTerenciaUy expres^p^ef 
Analpu of m sanded sLT^^t 
required ro identify genes whJe^S 
change, upon HAT4 overexpressi^ 
nauvely, a comparison of^ A ^ 
uon, from specific tissue, c/wild-ty^^ 

cJl ^^J° put 

SIw£T£ * tKu douic * 8 'ingle array 
would ^sufficient to provide gene^pecific 

eS ^"^^ n«riy the entirTrep. 
eroue ofexpre»ed genes in the Arafc^ 
genome (2). The avaibbUity of 2oi74Es£ 
fern AroWoprir (i, 9) would provide a rich 
source of template, for «uch studieT 

The estimated 100,000 gene, in the hu- 
^nome (/0) exceeds^he numtr of 
Arafcdppw gene, by « factor of 5 (2). This 
mod . c,t in f ^« « complexity suggests that 

co«W be used to determine the 
expression patterns of tens of thousands of 

an amphf,cation strategy to the reverse 
n,nsc„ P non reaction (ij) could make it 
fcas.ble to monitor expression even * 
mmute tissue samples. A wide variety of 
acute and chronic physiological and 
logical conditions might lead to cha£u£ 
■sue changes in the patterns of genee^ 
5.on in peripheral blood cells or «her eS 
sampled tissues. In concert with cDNA ml 
croarrays for monitoring complex expres- 
sion patterns, these tusues m*hr theS 
serve as sensitive in vivo sensors for clinical 
diagnosis. Microarray, ofcDNAs could th« 

ter^t" amounlsof AChR mRNA. VaJu« 
for the mrcroarray were determined trxxnnicm^ 
ray scans (Rg. 1); values tor the Rr^tSfSl* 
determined from Rna blots (RqV 
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Severe combined irnmunodeficiency asso- 
ciated with inherited deficiency of ADA 
(J) is usually ratal unless affected children 
are kept in protective isolation or the im- 
mune system is reconstituted by bone mar- 
row transplantation from a human leuko- 
cyte antigen (HLAMdentical sibling donor 
[2h This is the therapy of choice, although 
it is available only for a minority of patients. 

recent yean, other forms of therapy have 
been developed, including transplants from 
haploidentical donors (J, 4). exogenous en- 
*Yme replacement (5), and somatic-cell 
gene therapy (6-9). 

We previously reported a preclinical mod. 
el in which ADA gene transfer and expression 

P* ^L^c ^"S?\ G - D "Woni. C. Rossi. 

to^Genattc Daaasas. OtBfT, istituto Sdentifco H. S. Rai. 
'•an, Mtsn, ttaiy. 

j-P; ^jyy^o. E- Mazzotari, K G. Ugaoc, Depart, 
menl of Pediatrics, University of Brescia Medical School 
BrascB. ttafy. 

G. Casorati, Unrta di ImrnunwWniica. OtBTT, btrtuto S6- 
•nrtcoa S. Raftaeie. M4an. itaty. 
P. Pania, Roche Mttano ftce rche. MUan. ftary. 
'To v*hom con^sponoence shoot) be addressed. 
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successfully restored immune functions in hu- 
man ADA-deficient (ADA") rxripheral 
biood lymphocytes (PBLs) in irnmunodefi- 
cient mice in vivo (JO, 11). On the basis of 
these preclinical results, the clinical applies, 
tion of gene therapy for the treatment of 
ADA" SCID (severe combined irniruinodefi- 
ciency disease) patients who previously railed 
exogenous enzyme replacement therapy was 
approved by our Institutional Ethical Com- 
mittees and by the Italian National Commit- 
tee for Bioethics (12). In addition to evaluat- 
ing the safety and efficacy of the gene therapy 
procedure, the aim of the study was to define 
the relative role of PBLs and hematopoietic 
stem cells in the long-term reconstitution of 
immune functions after retroviral vector-me- 
diated ADA gene transfer. For this purpose, 
two structurally identical vectors expressing' 
' the human ADA complementary DNA 
(cDNA), distinguishable by the presence of 
alternative restriction sites in a nonfunctional 
region of the viral long-terminal repeat 
(LTR), were used to transduce PBLs and bone 
marrow (BM) cells independently. This pro- 
cedure allowed identification of the origin of 
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subssoutnt to «„ob£tk cMwL ? \ ■ , ""Tp™ oivnonmenul conditions or 

groups using the technique. amerenuaiiy expressed genes as there are research 

son* of the practical asptos of S» !h?£* ™ S ^" " cludtd " * ^aission on 

brtol, a wall-taLn ^ „7C™ 8 8 ™LL^ 8 Tnz f ™r e " POiU " *° 
x«nobio,£ toiictnS, S rXT tau™,,!?"?" °' "T? 1 'J 21 ™ "»1 
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Introduction 

neonll>Tc aPParent ** develo P ment ° f *l™st all cancers and many non- 
neoplastic diseases are accompanied by altered gene expression in the arTe«ed celk 

"If™™* Sme (Hunter 1991 » WynforS-ThomJ 99 Tvogelste^ 
an I Kmxler 1993, Semenza 1 994, Cassidy 1 995, Kleinjan and Van He g L g Tw8) 
Such changes also occur m response to external stimuli such as pathogenTmkro 
S3 ^ner ./ 1996, Singh et al. 1997, Griffin and Krishna 1 9%TunZ 

199§ Z ^T T- (S T " al 1995 « D °^ ra et al - 19 98, Ramana and S 
RnH?' JSu" dUnng ? C devel °P m «« of undifferentiated cells (Hecht 1 998 
Rudin and Thompson 1998, Schneider-Maunoury et al 1998 ) Th* T, . I 
medical and therapeutic benefits of understand^* molecul J' clang's wh^ 
occur m any given cell in progressing from the normal to the Altered ' Ttau re 
enormous. Such profiling essentially provides a 'fingerprint' of each step of a 

t Cu^/A^^Tc^^'^- S-eibs n@surrey.ac.uk " 

t Rh ne-PoulencAgrochemicals, Toxic logy Department, Sophia-Antipolis, Nice. France 
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cell s development or response and should help in the elucidation of specific and 
_ sensitive biqmarkers representing, for example, different types of cancer orprevious 
exposure to certain classes of chemicals that are enzyme inducersr 

In drug metabolism, many of the xenobiotic-metabolizing enzymes (including 
the well-character^d isoforms of cytochrome P4S0) are inducible by drugs and 
chemicals in man (Pelkonen et oi 1998), predominantly involving transcriptionaJ 
activation of not only the cognate cytochrome P450 genes, but additional cellular 
proteins which may be crucial to the phenomenon of induction AccordmgTt£ 
development of methodology to identify and assess the full complement of genes 
Aat are either up- or down-regulated by inducers are crucial in the development 5 
knowledge to understand the precise molecular mechanisms of enzyme iEc* 
and how this relates to drug action. Similarly, in the field of cheLl^cS 
toxicity, it is now becoming increasingly obvious that most adverse reactions to 
drugs and chemicals are the result of multiple gene regulation, some of wh ch are 
causal and some of which are casually-related to the topological phenomenon pel 
J^*a* bserVat,0n u has led t0 an U P SU ^ in interest m gene-profiling technobgiel 
which difference between the control and toxin-treated gene pools in 4rget S 
and is therefore, of value in rationalizing the molecular mechanisms oSSSS 
induced toxicity. Knowledge of toxin-dependent gene regulation in target 4ues is 
not solely an academe pursuit as much interest has been generated in " 
pharmaceutical industry to harness this technology in the early identification of toxic 

tllZ tT* Sh0rtenmg tHe devel °P-ental process and conmbutfng 

substantially to the safety assessment of new drugs. For example, if the gene profifc 
m response to say a testicular toxin that has been well-characterized in 
determined m the testis, then this profile would be representative of all new drug 
candidates which act via this specific molecular mechanism of toxicity, thereby 
providing a useful and coherent approach to the early detection of such toxiSn^ 
Whereas ,t would be mformative to know the identity and functionality of aH ™S 
up/down regulated by such toxicants, this would appear a longer term goal £ the 
majority of human genes have not yet been sequenced, far les! theWunct^hty 
determined. However the current use of gene profiling yields a pattern of g Z 
changes for axenobiotic of unknown toxicity which may be matched to tnTt of £5? 
characterized toxins, thus alerting the toxicologic to possible in vivo similarities 
between the unknown and the standard, thereby providing a platforrnTc m « 
extensive topological examination. Such approaches are beginnW t0 ^n 
"turn, in that several biotechnology companies are commerdally^rodu^ 
gene chips or gene arrays' that may be interrogated for toxicitv assessment^ 
xenobiotics. These chips consist of hundreds/thousands of genes, some of whicn are 
degenerate in the sense that not all of the genes are mechanistically-related to any 
one toxicological phenomenon Whereas these chips are useful in broad-spectrum 
screening, they are maturing at a substantial rate, in that gene arrays are now 
becoming more specific, e.g. chips for the identification of changes in growth factor 
rTplSas ' C ° mn t0 aCti0l0gy devel °P ment of chemiLlly-inducS 
Although documenting and explaining these genetic changes presents a 
formidable obstacle to understanding the different mechanisms of development and 
disease progression, the technology is now available to begin attempting this difficult 

? . w SC , Veral differentiaI expre«on analysis' methods have been 

developed which facilitate the identification of gene products that demonstra* 
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altered expression in cells of one population compared to another. These methods 
have been used to identify differential gene expression in many situations, including 
invading pathogenic microbes (Zhao et al. 1 998), in cells responding to extracellular 
and intracellular microbial invasion (Diiguid and Dinauer 1990, Ragno utal 1997 
Maldarelli et al. 1998), in chemically treated cells (Syed et al. 1997 Rockett et al 
1999), neoplastic cells (Liang et al. 1992, Chang and Terzaghi-Howe 1998)' 
activated cells (Gurskaya et al. 1996, Wan et al. 1996), differentiated cells (Hara et 
al. 1991, Guimaraes et al. 1995a, b), and different cell types (Davis et al 1984 
Hednck et al. 1984 Xhu et al. 1998). Although differential expression analysis 
technologies are apphcable to a broad range of models, perhaps their most important 
advantage is that ,n most cases, absolutely no prior knowledge of the specific genes 
which are up- or down-regulated is required. 

The field of differential expression analysis is a large and complex one, with 
many techniques available to the potential user. These can be categorized into 
several methodological approaches, including: 

(1) Differential screening, 

(2) Subtractive hybridization (SH) (includes methods such as chemical cross- 
ceii n8 ^ ubtract,on - CCL S, suppression-PCR subtractive hybridization- 
SSH, and representational difference analysis— RDA), 

(3) Differential display (DD), 

(4) Restriction endonuclease facilitated analysis (including serial analysis of gene 
expression— SAGE— and gene expression fingerprinting — GEF), 

(5) Gene expression arrays, and 

(6) Expressed sequence tag (EST) analysis. 

The above approaches have been used successfully to isolate differentially 
expressed genes m different model systems. However, each method has its own 
subtle (and sometimes not so subtle) characteristics which incur various advantages 
and disadvantages. Accordingly, it is the purpose of this review to clarify the 
mechanistic principles underlying the main differential expression methods and to 
highlight some of the broader considerations and implications of this very powerful 
and increasingly popular technique. Specifically, we will concentrate on the so- 
called open systems, namely those which do not require any knowledge of gene 
sequences and, therefore, are useful for isolating unknown genes. Two 'closed' 
systems (those utilising previously identified gene sequences), EST analysis and the 
use of DNA arrays, will also be considered briefly f or completeness. Whilst 
emphasis will often be placed on suppression PCR subtractive hybridization (SSH 
the approach employed in this laboratory), ,t is the aim of the authors to highlight' 
wherever possible, those areas of common interest to those who use, or intend to use' 
differential gene expression analysis. 



Diff rential cDNA library screening (DS) 

Despite the development of multiple technological advances which have recently 
brought the field of gene expression profiling to the forefront of molecular analysis 
recognition of the importance of differential gene expression and characterization of 
differentially expressed genes has existed for many years. One of the original 
approaches used to identify such genes was described 20 years ago by St John and 
Davis (1979). These authors developed a method, termed 'differential plaque filter 
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hybridization', which was used to isolate galactose-inducible DNA sequences from 
—yeast The theory ir simpler a genomic DNA library is prepared from normal, 
unstimulated cells of the test organism/tissue and multiple filter replicas are 
prepared. These replica blots are probed with radioactively (or otherwise) labelled 
complex cDNA probes prepared from the control and test cell mRNA populations 
Those mRNAs which are differentially expressed in the treated cell population wiU 
show a posit.ve signal only on the filter probed with cDNA from the treated cells. 
Furthermore, labelled cDNA from different test conditions can be used to probe 
multiple blots, thereby enabling the identification of mRNAs which are only un- 
regulated under certain conditions. For example, St John and Davis (1979) screened 
replica filters with acetate- glucose- and galactose-derived probes in order to obtain 
genes induced specifically by galactose metabolism. Although groundbreaking in its 
time this method is now considered insensitive and time-consuming, as up to 2 
months are required to complete the identification of genes which are differentially 
expressed m the test population. In addition, there is no convenient way to check 
that the procedure has worked until the whole process has been completed. 

Subtractive Hybridization (SH) 

The developing concept of differential gene expression and the success of early 
approaches such as that described by St John and Davis (1979) soon gave rise to I 
search for more convenient methods of analysis. One of the first to be developed was 
SH, numerous variations of which have since been reported (see below). In general 
this approach involves hybridization of mRNA/cDNA from one population (tester) 
to excess mRNA/cDNA from another (driver), followed by "separation of the 
unhybridized I tester fraction (differentially expressed) from the hybridized common 
sequences. This step has been achieved physically, chemically and through the use 
of selective polymerase chain reaction (PCR) techniques. 

Physical separation 

^^"^r 15 ' 13 "^ 6 hybridization technology involved the physical separation 
of hybndized common species from unique single stranded spec.es. Several methods 
of achieving this have been described, including hydroxyapatite chromatography 
(Sargent and Dawid 1983). avidin-biotin technology (Duguid and Dinauer 1990) 
and oligodT-latex separation (Hara et al. 1991). In the first approach, common 
mRNA species are removed by cDNA (from test cells)-mRNA (from control cells) 
subtractive hybridization followed by hydroxyapatite chromatography, as hydroxy- 
apatite specifically adsorbs the cDNA-mRNA hybrids. The unabsorbed cDNA is 
then used either for the construction of a cDNA library of differentially expressed 
genes (Sargent and Dawid 1983, Schneider et al. 1988) or directly as a probe to 
JSJT a A pre " leCted ^ nf y (Zimmerman et al. 1 980, Davis et al. 1 984, Hedrick et al 
1984). A schematic diagram of the procedure is shown in figure 1 

Less rigorous physical separation procedures coupled with sensitivity enhancing 
PCR steps were later developed as a means to overcome some of the problems 
<V 9 C 9°m n H e K H h * d ™*™™ Procedure. For example, Daguid and Dinauer 
1990) described ! a method of subtraction utilizing biotin-affinity systems as a means 
to remove hybridized common sequences. In this process, both the control and 
tester mRNA populations are first converted to cDNA and an adaptor ('oligovector ' 
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Control (driver) mRNA 



-AAAA 



Tester (test) cDNA (1st strand) 




Mix (ratio >35:1) & hybridize 



-AAAA 



-AAAA 



-AAAA 
— AAAA 



I 

Hydraxyapatite chromatography 



RNAxDNA hybrids removed 



Unhybridized 

cDNA (differentially expressed) 
and mRNA 



-AAAA 
AAAA 



Sepharose CL6B exclusion 
chromatography 



Small cDNA fragments (<450bp) 



Enriched, differentially expressed cDNA 



Produce clones Label directly and probe library 

Figure 1. The hydroxyapatite method of subtractive hybridization. cDNA derived from the 
treated /altered (tester) population is mixed with a large excess of mRNA from the control (driver) 
population. Following hybridization, mRNA-cDNA hybrids are removed by hydroxyapatite 
chromatography. The only cDNAs which remain are those which are differentially expressed in 
the treated/altered population. In order to facilitate the recovery of full length clones small cDNA 
fragments are removed by exclusion chromatography. The remaining cDNAs are then cloned into 
a vector for sequencing, or labelled and used directly to probe a library, as described by Sargent 
and Dawid (1983). 6 



containing a restriction site) ligated to both sides. Both populations are then 
amplified by PCR, but the driver cDNA population is subsequently digested with 
the adaptor-containing restriction endonuclease. This serves to cleave the oligo- 
vector and reduce the amplification potential of the control population. The digested 
control population is then biotinylated and an excess mixed with tester cDNA. 
Following denaturation and hybridization, the mix is applied to a biocytin column 
(streptavidin may also be used) to remove the control population, including 
heteroduplexes formed by annealing of common sequences from the tester 
population. The procedure is repeated several times following the addition of fresh 
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Control (driver) mRNA 

" •••••••••••••AAAA " 

********** a **y^^^A 



Test (tester) mRNA 

AAAA 

AAAA 



| Anneal mRNA to polydTx latex beads 
^UkL 

AAJXjV ************* 



^ cDNA synthesis 



Mix and anneal 



^llll- 



AAAA- 



AAAA 



•rat 



AAAA 



4 

Centrifuge beads, collect and store supernatant, 
dissociate polyA, reapply supernatant 



AAAA 



AAAA 



Tester-specific mRNA retrieved after 
4 rounds of hybridization 



cDNA synthesis 

I 

Ligate adaptors and insert into vector 

Sequence inserts and/or carry out 
other downstream applications 

Figure 2. The use of oligodT^ latex to perform subtractive hybridization, mRNA extracted from the 
control (driver) population is converted to anchored cDNA using polydT oligonucleotides 
attached to latex beads. mRNA from the treated/altered (tester) population is repeatedly 
hybridized against an excess of the anchored driver cDNA. The final population of mRNA is 
tester specific and can be converted into cDNA for cloning and other downstream applications, as 
described by Hara et al (1991). 
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. conm^cDN^ . In .order jo further enrich .those species differentially expressed in 
the tester cDNA, the subtracted tester population is amplified by-PCR following 
every second subtraction cycle. After six cycles of subtraction (three reamplification 
steps) the reaction mix is ligated into a vector for further analysis. 

In a slightly different approach, Hara et al. (1991) utilized a method whereby 
ohgofdT^) primers attached to a latex substrate are used to first capture mRNA 
extracted from the control population. Following 1st strand cDNA synthesis the 
RNA strand of the heteroduplexes is removed by heat denaturation and centri- 
fugation (the cDNA-oligotex-dT, forms a pellet and the supernatant is removed). 
A quantity of tester mRNA is then repeatedly hybridized to the immobilized control 
(driver) cDNA (which is present in 20-fold excess). After several rounds of 
hybridization the only mRNA molecules left in the tester mRNA population are 
those which are not found in the driver cDNA-oligotex-dT* population. These 
tester-specific mRNA species are then converted to cDNA and, following the 
addition of adaptor sequences, amplified by PCR. The PCR products are then 
hgated into a vector for further analysis using restriction sites incorporated into the 
PCR primers. A schematic illustration of this subtraction process is shown in figure 

However, all these methods utilising physical separation have been described as 
inefficient due to the requirement for large starting amounts of mRNA, significant 
loss of material during the separation process and a need for several rounds of 
hybridization. Hence, new methods of differential expression analysis have recently 
been designed to eliminate these problems. 



Chemical Cross-Linking Subtraction (CCLS) 

In this technique, originally described by Hampson et al. (1992) driver mRNA 
is mixed with tester cDNA (1st strand only) in a ratio of > 20:1. The common 

T^aT^Z C T DNA -- mRI t A h y brids > leav *ng the tester specific species as single 
stranded I cDNA. Instead of physically separating these hybrids, they are inactivated 
chemically using 2,5 diaziridinyl-l,4-benzoquinone (DZQ). Labelled probes are 

™7 ntheS1ZCd fr ° m thC remainin g sin 6 le stranded cDNA species (unreacted 
mRNA species remaining from the driver are not converted into probe material due 
to specificity of Sequenase T7 DNA polymerase used to make the probe) and used 
to screen a cDNA library made from the tester cell population. A schematic diagram 
of the system is shown in figure 3. 

, I ?no Sh 7" th3t thC differentiall y expressed sequences can be enriched at 
least 300-fold with one round of subtraction (Hampson et al. 1992), and that the 
technique should allow isolation of cDNAs derived from transcripts that are present 
at less than 50 copies per cell. This equates to genes at the low end of intermediate 
abundance (see table 1). The main advantages of the CCLS approach are that it is 
rapid, technically simple and also produces fewer false positives than other 
differential expression analysis methods. However, like the physical separation 
protocols a major drawback with CCLS is the large amount of starting material 
required (at least 10 ^g RNA). Consequently, the techmque has recently been 
refined so that a renewable source of RNA can be generated. The degenerate random 
oligonucleotide J primed (DROP) adaptation (Hampson et al. 1996, Hampson and 
Hampson 1997) uses random hexanucleotide sequences to prime solid phase- 
synthesjzed cDNA. Since each primer includes a T7 polymerase promotor sequence 
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Control (driver) mRNA 



-AAAA 
-AAAA 
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1st strand cDNA synthesis 
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Mix and anneal 



mRNAxDNA hybrids 



Unique cDNA species 



T 



-AAAA 



Hybrids are cross-linked 
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xxxxxzxxx 



■AAAA 



1 
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Figure 3. Chemical cross-linking subtraction Excess driver ^pva j ■ ^ . , 

cDNA. The common sequences ^J^cdSa^a. ^ V Strand tester 

diaziridinvl-l 4-ben2oouir.on. 7n 7 m * w *- cONA hybrids which are cross linked with 2,5 

expressed'int^ sequences are dirTerentialiy 

DNA polymerase which lacks rVv^T™ sec ' uences us »ng Sequenase 2.0 

remaiLg^RNA'mdtuSrr^ do « ™ the 

^clones of dually -^^C^M 



Tabid. The abundance of mRNA species and cl 



asses in a Typical mammalian cell. 



mRNA 
class 



Copies of No. of mRNA Mean % of 
each species in each species 

species/cell class j n class 



Mean mass 
(ng) of each 
species /fig 
total RNA 



Abundant 

Intermediate 

Rare 



12000 
300 
15 



4 

500 
11000 



3.3 

0.08 

0.004 



1.65 
0.04 
0.002 



M dined fr m Bertioli et al. (1995). 
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at the 5'end, the final pool of random cDNA fragments is a PCR-renewable cDNA 
..population wbch is. representative of the expressechgene pool and can be used to 
synthesize sense RN A for use as driver material. Furthermore, ffthe final pool of 
random cDNA fragments is reamplified using biotinylated T7 primer and random 
hexarner the product can be captured with streptavidin beads and the antisense 
strand eluted for use as tester. Since both target and driver can be generated from 
the same DROP product, subtraction can be performed in both directions (i e for 
up- and down-regulated species) between two different DROP products. 

Representational Difference Analysis (RDA) 

■ ° f ^ (HUb3nk Sch3tZ 1994) is an extensi °" of the technique 

originally apphed to genomic DNA as . means of identifying differences between 
two complex genomes (Lishsyn et al. 1993). It ,s a pries' of Z 
amplification involving subtractive hybridation of the tester in the presence of 
excess driver. Sequence. ,„ the tester that have homologues in the driver ar 
rendered unamphfiable whereas those genes expressed only in the tester retain the 
ability to be amplified by PGR. The procedure is shown schematically in figure 4 

In essence.the driver and tester mRNA populations are first converted to cDNA 
and amplified by PGR following the ligation of an adaptor. The adaptors are then 
removed from both populations and a new (different) adaptor ugated to the 
amplified tester population only. Driver and tester populations are next melted and 
hybridized together in a ratio of 100:1. Following hybridization, only tester : tester 

mTtt^h" end! 6 He Pt0 ^ " ^ ° NA dUpl " md « thuS ' be filled 

m at both 3 "^Hence, only these molecules are amplified exponentially during 

th subsequent PGR step Although tester . driver heterohybrids are present, they 

only amplify in a linear fas hl on, since the strand derived from the driver has Z 

adaptor to which the pnmer can bind. Driver rdriver heterohybrids have n"o 

adaptors and therefore, are not amplified. Single stranded molecules are digested 

with mung bean nuclease before a further PCR-enrichment of the tester ftester 

homohybnds. The adaptors on the amplified tester populat.on are then replaced nd 

J SOnonnf t , * tester:driver "tio of 1:400, 1:80000 and 

1:800000 for the second, third and fourth hybridizations, respectively). Different 
adaptors are hgated to the tester between successive rounds of hybridization and 
amplification to prevent the accumulation of PCR products that might interfere with 
subsequent amplifications. The final display is a series of differentially expressed 
gene products easily observable on an ethidium bromide gel 

The main advantages of RDA are that it offers a reproducible and sensitive 
approach to the analysis of differentially expressed genes. Hubank and Schatz (1994) 
reported that they were able to isolate genes that were differentially expres ed in 
substantial* less than 1 % of the cells from which the tester is derived. Perhaps the 
mam drawback is that multiple rounds of ligation, hybridization, ampliation and 
digestion are required. The procedure is, therefore, lengthier than many other 
differential display approaches and provides more opportunity for operator-induced 
error to occur. Although the generation of false positives has been noted, this has 
been solved to some degree by O'Neill and Sinclair (1997) through the use of HPLC- 
punfied adaptors^ These are free of the truncated adaptors which appear to be a 
major source of the false positive bands. A very similar technique to RDA, termed 
linker capture subtraction (LCS) was described by Yang and Sytowski (1996) 
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fure 4. The representational difference analysis (RDA) technique. Driver and tester cDNA a „ 
digested w,th a 4-cutter restriction enzyme such as DpnU The 1- set of 12/iiTrf. , I 

removed with mung bean nuclease leaving th* 'fir*, a;*- j P roducts a " 

(1993).and Hubank and Schatz (1994). d,fference P"*"". ■• descnbed by L^.tsyn et al. 
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^ S H& )re . s . s * on Sjftejgctwe Hybridization (SSH)+ . 

The most recent adaptation of the SH approach to differential expression 
analysis was first described by Diatchenko et al. (1996) and Gurskaya et al (1996) 
They reported that a 1000-5000 fold enrichment of rare cDNAs (equivalent to 
isolating mRNAs present at only a few copies per cell) can be obtained without the 
need for multiple hybridizations/subtractions. Instead of physical or chemical 
removal of the common sequences, a PCR-based suppression system is used (see 
figure 5). v 

In SSH excess driver cDNA is added to two portions of the tester cDNA which 
have been ligated with different adaptors. A first round of hybridization serves to 
enrich differentially expressed genes and equalize rare and abundant messages 
Equalization occurs since reannealing is more rapid for abundant molecules than for 

^£7™^* W tH v S , eC ° J nd ° rder kinCticS of h y b "di2ation (James and Higgins 
1985). The two primary hybridization mixes are then mixed together in the presence 
of excess driver and allowed to hybridize further. This step permits the annealing of 
single stranded complementary sequences which did not hybridize in the primary 
hybridization, and in doing so generates templates for PCR amplification. Although 
there are several possible combinations of the single stranded molecules present in 
the secondary hybridization mix, only one particular combination (differentially 
expressed ,n the tester cDNA composed of complimentary strands having different 
adaptors) can amplify exponentially. 

Having obtained the final differential display, two options are available if cloning 
of cDNAs is desired. One is to transform the whole of the final PCR reaction into 
competent cells. Transformed colonies can then be isolated and their inserts 
characterized by sequencing, restriction analysis or PCR. Alternatively, the final 
PCR products can be resolved on a gel and the individual bands excised, reamplified 
and cloned The first approach is technically simpler and less time consuming 
However l.gation/transformation reactions are known to be biased towards the 
cloning of smaller molecules, and so the final population of clones will probably not 
contain a representative selection of the larger products. In addition, although 
equalization theoretically occurs, observations in this laboratory suggest that this is 
by no means perfectly accomplished. Consequently, some gene species are present 
m a higher number than others and this will be represented in the final population 
of clones Thus, m order to obtain a substantial proportion of those gene species that 
actually demonstrate differential expression in the tester population, the number of 
clones that will have to be screened after this step may be substantial. The second 
approach is initially more time consuming and technically demanding. However it 
would appear to offer better prospects for cloning larger and low abundance gel 
products. In addition one can incorporate a screening step that differentiates 
different products of different sequences but of the same size (HA-staining see 
later). In this way, a good idea of the final number of clones to be isolated and 
identified can be achieved. 

An alternative (or even complementary) approach is to use the final differential 
display reaction to screen a cDNA library to isolate full length clones for further 
characterization, or a DNA array (see later) to quickly identify known genes. SSH 
has been used in this laboratory to begin characterization of the short-term gene 
tXP /Z i0n <fZ l tS ,^ e , n2 y me - induc «s such as phenobarbital (Rockett et al. 1997) 
and Wy-14,643 (Rockett et al. unpublished observations). The isolation of 
differentially expressed genes in this manner enables the construction of a fingerprint 
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Figure 6. Flow diagram showing method used in this laboratory to isolate and identify clones of genes 
which are differentially expressed in rat liver following short term exposure to the enzyme 
inducers, phenobarbital and Wy-14,643. 



of expressed genes which are unique to each compound and time /dose point. Such 
information could be useful in short-term characterization of the toxic potential of 
new compounds by comparing the gene-expression profiles they elicit with those 
produced by known inducers. Figure 6 shows a flow diagram of the method used to 
isolate, verify and clone differentially expressed genes, and figure 7 shows expression 
profiles obtained from a typical SSH experiment. Subsequent sub-cloning of the 
individual bands, sequencing and gene data base interrogation reveals many genes 
which are either up- or down-regulated by phenobarbital in the rat (tables 2 and 3). 

One of the advantages in using the SSH approach is that no prior knowledge is 
required of which specific genes are up/down-regulated subsequent to xenobiotic 
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^^^^ 

exposure, and an almost complete complement of genes are obtained. For example 
the peroxisome proliferator and non-genotoxk hepatocarcinogen Wy,14 643 up-' 
regulates at least 28 genes and down-regulates at least 15 in the rat (a sensitive 
species) and produces 48 up- and 37 down-regulated genes in the guinea pig a 
resistant spec.es (Rockett, Swales, Esda and Gibson, unpublished observaSns) 
One of these genes, CDS , was up-regulated in the rat and down-regulated in the 
guinea P ig following Wy-14,643 treatment. CD81 (alternatively named TAPA-1) is 
a widely expressed cell surface protein which i, involved m a large number of cellu a 

Since 7 f^T ' nd ^"ent.atLn Levy 

al. 1998 . Since all of these functions are altered to some extent m the phenomena 

of hepatomegaly and non-genotoxk hepatocarcinogenesis, it i, intriguing and 
probably mechamsucally-relevant, that CD81 expression i, differentially regulated 
in a re Slstant and susceptlble species However ^ V ^ 

hat he majority of genes can be sequenced and matched to database sequences bu 
the latter are predominantly expressed sequence tags or genes of comp Lely 
unknown function, thus partially obscuring a reahsnc overall assessment of the 
critical genes of genuine biological interest. Notwithstanding the lack of complete 
fumiona identification of altered gene expression, such gene profiling sTud 
essennally provides a 'molecular fingerprint' ,n response to xenobiotic chal enge 
£j£*ET 35 ' mech ^^-levant platform for further detaifed' 



Diff rential Display (DD) 

Originally described as «RN A fingerprinting by arbitrarily primed PGR » (Liang 
and Pardee 1992) th.s method is now more commonly referred to as 'differential 
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Table 2. 


Genes up-regulated in rat liver following 3 -day exposure to phenobarbital. 


Band number 






(approximate 


Highest sequence 




size in bp) 


similarity 


FASTA-EMBL Bene iH*ntifif**n*rm 


5 (1300) 


93.5% 


CYP2B1 


7 (1000) 


95.1 % 


Prcproalbumin 


8 (950) 




Serum albumin mRNA 


98.3% 


NCI-CGAP-Prl H sapiens (T^T} 


10 (850) 


95.7% 


CYP2B1 


11(800) 


Clone 1 94.9% 


CYP2B1 




Clone 2 75.3% 


CYP2B2 


12 (750) 


93.8% 


TRPM-2 mRNA 


15 (600) 




Sulfated glycoprotein 


92.9% 


Preproalbumin 


16(55) 




Serum albumin mRNA 


Clone 1 95.2% 


CYP2B1 


21 (350) 


Clone 2 93.6% 


Haptoglobulin mRNA partial alpha 


99.3% 


18S, 5.8S&28S rRNa 



Bands 1-4, 6, 9, 13, 14, and 17-20 are shown to be false positives by dot blot anaylsis and therefore 
are not sequenced. Derived from Rockert et at. (1997). It should be noted that the above genes do not 
represent the complete spectrum of genes which are up-regulated in rat liver by phenobarbital, but 
simply represents the genes sequenced and identified to date. 



Table 3. Genes down-regulated in rat liver following 3-day exposure to phenobarbital. 



Band number 
(approximate 
size in bp) 



Highest sequence 
similarity 



FASTA-EMBL gene identification 



1 (1500) 




2 (1200) 




3 (1000) 




7 (700) 


Clone 1 




Clone 2 




Clone 3 


8 (650) 


Clone 1 




Clone 2 


9 (600) 


Clone 1 




Clone 2 


10 (550) 




11 (525) 




12 (375) 




13 (23) 


Clone 1 



14 (170) 

15 (140) 
Others: (300) 

(275) 



Clone 2 
Clone 3 



95.3% 
92.3% 
91.7% 
77.2% 
94.5% 
91.0% 
86.9% 
96.2% 
86.9% 
82.0% 
73.8% 
95.7% 
100.0% 
97.2% 
100.0% 
100.0% 
96.0% 
97.3% 
96.7% 
93.1% 



3-oxoacyl-CoA thiolase 
Hemopoxin mRNA 
Alpha-2u-globulin mRNA 
M.musculus CI inhibitor 
Electron transfer flavoprotein 
M. musculus Topoisomerase 1 (Topo 1) 
Soares 2NbMT M. musculus (EST) 
Alpha-2u-globulin (s-type)mRNA 
Soares mouse NML M. musculus (EST) 
Soares p3NMF 19.5 M. musculus (EST) 
Soares mouse NML M, musculus (EST) 
NCI-CGAP-Prl H. sapiens (EST) 
Ribosomal protein 

Soares mouse embryo NbME135 (EST) 

Fibrinogen B-beta-chain 

Apolipoprotein E gene 

Soares p3NMFl9.5 M. musculus (EST) 

Stratagene mouse testis (EST) 

R. norvegicus RASP 1 mRNA 

Soares mouse mammary gland (EST) 



EST = Expressed sequence tag. Bands 4-6 were shown to be false positives by dot blot analysis and 
therefore, were not sequenced. Derived from Rockert*/ al. (1997). It should be noted that the above genes 
do not represent the complete spectrum of genes which are down-regulated in rat liver by phenobarbital 
but simiply represents the genes sequenced and identified to date. 



display' (DD). In this method, all the mRNA species in the control and treated cell 
populations are amplified in separate reactions using reverse transcriptase-PCR 
(RT-PCR). The products are then run side-by-side on sequencing gels. Those 
bands which are present in one display only, or which are much more intense in one 
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display compared to the other, are differentially expressed and may be recovered for 
-Hurthercharartemation: One advantage of this system is thespeedwith which it can 
be earned out-2 days to obtain a display and as little as a week to make and identify 
clones. ' 

Two commonly used variations are based on different methods of priming the 
reverse transcription step (figure 8). One is to use an oligo dT with a 2-base 'anchor' 
at the 3»-end, e.g. 5' (dTJCA 3' (Liang and Pardee 1992). Alternatively, an 
arbitrary primer may be used for 1st strand cDNA synthesis (Welsh et al 1992) 
This variant of RNA fingerprinting has also been called 'RAP' (RNA Arbitrarily 
Primed)-PCR. One advantage of this second approach is that PCR products may be 
derived from anywhere in the RNA, including open reading frames. In addition, it 
can be used for mRNAs that are notpolyadenylated, such as many bacterial mRNAs 
(Wong and McClelland 1994). In both cases, following reverse trans CT iptTon and 
denaturation, second strand cDNA synthesis is carried out with an arbitrary primer 
{arbitrary primers have a single base at each position, as compared to random 
primers, which contain a mixture of all four bases at each position). The resulting 
PCR, thus, produces a series of products which, depending on the system (primer 
length and composition, polymerase and gel system), usually includes 50-100 
products per primer set (Band and Sager 1989). When a combination of different 
dT-anchors and arbitrary primers are used, almost all mRNA species from a cell can 
be amplified. When the cDNA products from two different populations are analysed 
side by side on a polyacrylamide gel, differences in expression can be identified and 
the appropriate bands recovered for cloning and further analysis 

Although DD is perhaps the most popular approach used today for identifying 
differentially expressed genes, it does suffer from several perceived disadvantages: 

(1) !L m c! y IT* ' Str ° ng bi3S tOWards high co W number mRN As (Bertioli et al 
1995) although this has been disputed (Wan etal. 1996) and the isolation of very 
low abundance genes may be achieved in certain circumstances (Guimeraes et 
at. 1995a). 

(2) The cDNAs obtained often only represent the extreme 3' end of the mRNA 
(often the 3 '-untranslated region), although this may not always be the case 
(Guimeraes et al. 1995a). Since the 3 'end is often not included in Genbank and 
shows variation between organisms, cDNAs identified by DD cannot always be 
matched with their genes, even if they have been identified. 

(3) The pattern of differential expression seen on the display often cannot be 
reproduced on Northern blots, with false positives arising in up to 70% of cases 
(Sun et al. 1994). Some adaptations have been shown to reduce false positives 
including the use of two reverse transcriptases (Sung and Denman 1997)' 
comparison of uninduced and induced cells over a time course (Burn et al 1994) 
and comparison of DDPCR-products from two uninduced and two induced 
lines (Sompayrac et al. 1995). The latter authors also reported that the use of 
cytoplasmic RNA rather then total RNA reduces false positives arising from 
nuclear RNA that is not transported to the cytoplasm. 

Further details of the background, strengths and weaknesses of the DD 
technique can be obtained from a review by McClelland et al. (1996) and from 
articles by Liang et al. (1995) and Wan et al. (1996). 
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mRNA 



(dTnJCA: AC 




-AAAAAAAA 

Arbitrary primer: 



1* strand cDNA 
4 AC 



1* strand cDNA 
< 



-UGAAAAAAA 



-AAAAAAA 



I 



Denature and synthesise 2 nd strand 
with any arbitrary primer ( ) ^ 



2 nd strand cDNA 



-AC 



2 nd strand cDNA 
► 



cDNA can now be amplified by PCR using original primer pair 

Figure 8. Two approaches to differential display (DD) analysis. I" strand svnthesis can be carried out 
erdjer with a polydT,, NN primer (where N ■ G, C or A) or with an arbitrary primer. The use of 
different combinations of G, C and A to anchor the first strand polydT primer enables the priming 
of the majority of polyadenylated mRNAs. Arbitrary primers may hybridize at none, one or more 
places along the length of the mRNA. allowing 1»' strand cDNA svnthesis to occur at none, one 
or more points m the same gene. In both cases, 2 nd strand synthesis is carried out with an arbitrary 
primer. Since these arbitrary primers for the 2 nd strand may also hybridize to the 1« strand cDNA 
in a number of different places, several different 2 Bd strand products mav be obtained from one 
binding point of the 1" strand primer. Following 2 nd strand synthesis, the original set of primers 
is used to amplify the second strand products, with the result that numerous gene sequences are 
amplified. 



Restriction endonuclease-facilitated analysis of gene expression 
Serial Analysis of Gene Expression (SAGE) 

A more recent development in the field of differential display is SAGE analysis 
(Velculescu et al. 1995). This method uses a different approach to those discussed so 
far and is based on two principles. Firstly, in more than 95% of cases, short 
nucleotide sequences ('tags') of only nine or 10 base pairs provide sufficient 
information to identify their gene of origin. Secondly, concatenation (linking 
together in a series) of these tags allows sequencing of multiple cDNAs within a 
single clone. Figure 9 shows a schematic representation of the SA GE process. In this 
procedure, double stranded cDNA from the test cells is synthesized with a 
biotinylated polydT primer. Following digestion with a commonly cutting (4bp 
recognition sequence) restriction enzyme ('anchoring enzyme'), the 3' ends of the 
cDNA population are captured with streptavidin beads. The captured population is 
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splitinto two and different adaptors ligated to the 5 'ends of each group. Incorporated 
"into the adaptors is a recognition sequence for a type IIS restriction enzyme— one 
which cuts DNA at a denned distance (< 20 bp) from its recognition sequence. 
Hence, following digestion of each captured cDNA population with the IIS enzyme, 
the adaptors plus a short piece of the captured cDNA are released. The two 
populations are then ligated and the products amplified. The amplified products are 
cleaved with the original anchoring enzyme, religated (concatomers are formed in 
the process) and cloned. The advantage of this system is that hundreds of gene tags 
can be identified by sequencing only a few clones. Furthermore, the number of times 
a given transcript is identified is a quantitative measurement of that gene's 
abundance m the original population, a feature which facilitates identification of 
differentially expressed genes in different cell populations. 

Some disadvantages of SAGE analysis include the technical difficulty of the 
method, a large amount of accurate sequencing is required, biased towards abundant 
mRNAs, has not been validated in the pharmaco/toxicogenomic setting and has 
only been used to examine well known tissue differences to date. 



Gene Expression Fingerprinting (GEF) 

A different capture/restriction digest approach for isolating differentially 
expressed genes has been described by Ivanova and Belyavsky (1995). In this 
method, RNA is converted to cDNA using biotinylated oligo(dT) primers. The 
cDNA population is then digested with a specific endonuclease and captured with 
magnetic streptavidin microbeads to facilitate removal of the unwanted 5' digestion 
products. The use of restricted 3 '-ends alone serves to reduce the complexity of the 
cDNA fragment pool and helps to ensure that each RNA species is represented by 
not more than one restriction product. An adaptor is ligated to facilitate subsequent 
amplification of the captured population. PCR is carried out with one adaptor- 
specific and one biotinylated polydT primer. The reamplified population is 
recaptured and the non-biotinylated strands removed by alkaline dissociation The 
non-biotinylated strand is then resynthesized using a different adaptor-specific 
primer m the presence of a radiolabeled dNTP. The labelled immobilized 3'cDNA 
ends are next sequentially treated with a series of different restriction endonucleases 
and the products from each digestion analysed by PAGE. The result is a fingerprint 
composed of a number of ladders (equal to the number of sequential digests used) 
By comparing test versus control fingerprints, it is possible to identify differentially 
expressed products which can then be isolated from the gel and cloned The 
advantages of this procedure are that it is very robust and reproducible and the 
authors estimate that 80-93% of cDNA molecules are involved m the final 
fingerprint. The disadvantage is that polyacrylamide gels can rarely resolve more 
than 300-400 bands, which compares poorly to the 1000 or more which are 
estimated to be produced in an average experiment. The use of 2-D gels such as 
those described by Uitterlinden et al. (1989) and Hatada et al. (1991) may help to 
overcome this problem. 

A similar method for displaying restriction endonuclease fragments was later 
described by Prashar and Weissman (1996). However, instead of sequential 
digestion of the immobolized 3*-terminal cDNA fragments, these authors simply 
compared the profiles of the control and treated populations without further 
manipulation. 
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-AAAA 



"MAMA 

TTTT* 



1* strand cDNA synthesis using 
biotinylatedpolydT primers 



cDNA cleaved with AE and 
^ captured with streptavivin beads 



GTAC 



-AAAA 



GTAC 



Divide in half and ligate linkers 




CATC 
GTAC. 



- AAAA r\ 
-I i i I [«/ \ 



CATG- 
GTAC 



-AAAA 
"TTTTT 



•U 



CATG- 
GTAC 

CATG 
GTAC- 



"AAAA 



j^AAA 



TP 



Cleave with tagging enzyrne (TE) | 
and produce blunt ends 



GGATGCATGXXXXXXXXX 
CCTACGTACXXXXXXXXX 



GGATGCATGO0OOO0OOO 
CCTACGTACOOOOOOOOO 



TE AE 



Tag 



AE 



Tag 



| Ligate and amplify 

GGATGCATGXXXXXXXXXOCXX>OOOOOCATGCATCC 
CCTACGTAC)000000(XXOOOOOOOOC)GTACGTAGG ^ 



DiTag 



AE 



AE 



Cleave with AE, isolate diTags, 
concatenate, clone and 
sequence 

AE 



— CATGXXXXXXXXXOOOOOOOOOCATG XXXXXXXXXOOOOOOOOOCATG— 
— GTACXXXXXXXXXOOOOQOOOOGTAC XXXXXXXXXOOOOOOOOOGTAC— 



Tag1 Tag 2 



Tag 3 Tag 4 



iSSS 3 VnH 8 " P ; CSS,0n (SAGE) anal> ' Sis ■ CDN A is Cleaved with an anch °™8 enzyme 
(AE)and the 3 ends captured using streptavidin beads. The cDNA pool is divided in half and each 

p m a feted t a different taker, each containing a type IIS restriction sue (tagging enzvme 

COTES OOOOO he ?T relC8SeS th£ 1,nk " P' US 8 sh °" leng* of cDNA 

(XXXXX and OOOOO indicate nucleotides of different tags). The two pools of tags are then 

2?S and TfS USm , 8 h ^*«* P"—. Following PCR, the products are cfeav d witE 
the AE and the ditags ,s lated from the linkers using PAGE. The ditags are then ligated (during 
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DNA arrays 

— -'Open ' differential ^display systems are cumbersome in that it takes a great deal 
of time to extract and identify candidate genes and then confirm th~at they aretdVed 
up- or down-regulated m the treated compared to the control tissue. Normally tne 
latter process is earned out using Northern blotting or RT-PCR. Even so each of 
the aforement 10 ned steps produce a bottleneck to the ultimate goal of rapid'anaW 

8 T, C n^ 10 " Pr ° blemS Wi " likdy be addressed IV the development o 

so-called DNA arrays (e.g. Gress et al. 1992, Zhao et al. 1995, Schena «Tl W6) 
the mtroducuon of which has signalled the next era in differential gene ^expr ssion 
analysis^ DNA arrays consist of a gridded membrane or glass 'chips' conS 
hundreds or thousands of DNA spots, each consisting of multiple copies of pa^of 
a known gene. The genes are often selected based on previously proven "vl em ent 
in oncogenesis ceU cychng, DNA repair, development and other cellulaTproceTses 
They are usually chosen to be as specific as possible for each gene and animal spec Is 
Human and mouse arrays are already commercially available and a few companS 
will construct a personalized array to order, for example Clontech LabTraTiL and 
Research Genetics Inc. The technique is rap ld in that hundreds or ev^i^nd* 
of genes can be spotted on a single array, and that mRNA/cDNA from the tes 
populates can be labelled and used directly as probe. When analysed wSi 
appropriate hardware and software, arrays offer a rapid and quantitative means t 
assess differences in gene expression between two cell populations. Of courT the e 
can only be identification and quantitation of those genes which are Z he arrav 
(hence the term 'closed' system). Therefore, one approach to eucTd ting the 

tT 0 C m U b r m£ChamSmS in ;°! VCd in 3 Particular disease/Sevelopment ys em " ayte 
to combine an open and closed system-a DNA array to directly ideTtifTld 

^-SEflt'T genes m mRNA 

system such as SSH to isolate unknown genes which are differentially expressed 

One of the main advantages of DNA arrays is the huge number of /.nTiZZL 
MM , P ° n *™r b ™*-s<>™ -mpames have reported griddin^P to 
60000 spots on a sing e glass 'chip' (microscope slide). These high density chip- 
based micro-arrays will probably become available as mass-produced off- he-shelf 
items ,„ the near future. This should facilitate the more rapid determination of 
differential expression in time and dose-response experiment.. AsideTom their 
high cost and the technical complexities involved in producing and probing^ DNA 

ZZ S 'l W T P , r0blCm WHiCh rCmainS ' CS P eCiall >' Wlth *. »ew«r micro-ar^ay 
(gene-chip) technologies, is that results are often not wholly reproducible between 

ex/L yTarl"' " " ^ » d ^ ^'resolved w' 7t 



EST databases as a means to identify differentially expressed genes 

c DNA P rK S ed . SeqU Jf nCC T (ESTs ) ™ P-rtiml sequences of clones obtained from 
cDNA libraries^ Even though most ESTs have no formal identity (putaTe 
identification is the best to be hoped for), they have proven to be a rapid and e£cTn 
means of discovering new genes and can be used to generate profiles of gene 
express.on m specific cells. Since they were first described by Adams etal (1991) 
there has been a huge explosion in EST production and it is estimated that there are 
now well over a million such sequences in the public domain, representing over half 
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_ ofjll human genes (HUlier et al. 1996).. This large number of freely available 
sequences (both sequence information and clones are normally available royalty-free 
from the originators) has enabled the development of a new approach towards 
differential gene expression analysis as described by Vasmatzis et al (1998) The 
approach is simple in theory: EST databases are first searched for genes that have a 
number of related EST sequences from the target tissue of choice, but none or few 
from non-target tissue libraries. Programmes to assist in the assembly of such sets of 
overlapping data may be developed in-house or obtained privately or from the 
internet For example, the Institute for Genomic Research (TIGR, found at 
http://Www.tigr.org) provides many software tools free of charge to the scientific 
community. Included amongst these is the TIGR assembler (Sutton et al 1995) a 
tool for the assembly of large sets of overlapping data such as ESTs, bacterial 
artificial chromosomes (BAC)s, or small genomes. Candidate EST clones repre 
senting different genes are then analysed using RNA blot methods for size and tissue 
specificity and, if required, used as probes to isolate and identify the full length 
cDNA clone for further characterization. In practice however, the method is rather 
more involved, requiring bioinformatic and computer analysis coupled with 
confirmatory molecular studies. Vasmatzis et al. (1998) have described several 
problems in this fledgling approach, such as separating highly homologous 
sequences derived from different genes and an overemphasis of specificity for some 
EST sequences. However, since these problems will largely be addressed by the 

f\t° P ZTl I S™ SUit3ble COm P uter algorithms and an increased completeness 
of the EST database, ,t ,s likely that this approach to identifying differentially 
expressed genes may enjoy more patronage in the future. 



Problems and potential of differential expression techniques 
The holistic or single cell approach ? 

When working with in vivo models of differential expression, one of the first 
issues to consider must be the presence of multiple cell types in any given specimen 
For example a liver sample is likely, to contain not only hepatocytes, but also 
potentially) Ito cells, b.le ductule cells, endothelial cells, various immune cells (e g 
lymphocytes, macrophages and Kupffer cells) and fibroblasts. Other tissues will 
each have their own distinctive cell populations. Also, in the case of neoplastic tissue 
there are almost always normal, hyperplastic and/or dvsplastic cells present in a 
sample. One must, therefore, be aware that genes obtained from a differential 
display experiment performed on an animal tissue model may not necessarily arise 
exclusively from the intended 'target' cells, e.g. hepatocytes/neoplastic cells If 
appropriate further analyses using immunohistccherrnstry, in situ hybridization or 
tn situ RT-PCR should be used to confirm which cell types are expressing the 
gene(s) of interest. This problem is probably most acute for those studying the 
differential expression of genes in the development of different cell types where 
there ,s a need to examine homologous cell populations. The problem is now being 
addressed atthe National Cancer Institute (Bethesda, MD, US A) where new micro- 
disection techniques have been employed to assist in their gene analysis programme 
the Cancer Genome Anatomy Project (CG AP) (For more information see web site • 
http://Www.ncbi.nlm.nih.gov/ncicgap/intro.html). There are also separation tech- 
niques available that utilise cell-specific antigens as a means to isolate target cells 
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e ; g ^oof eSC ! nCC aCtiVated CeU SOrtin8 (FACS > < Dunbar et al - 1998, Kas-Deelen et 
a/-1998) and-magnetic-bead technology (Richard et al. 1998, Rogler et al 1998) 

However, those taking a holistic approach may consider this iis"ue unimportant 
There ,s an equally appropriate view that all those genes showing altered expression 
w,thui a compromized tissue should be taken into consideration. After all since all 
tissues are complex mixes of different, interacting cell types which intimately 
regulate each other s growth and development, it is clear that each cell type could in 
some way commute (positively or negatively) towards the molecular mechanisms 
which he behind responses to external stimuli or neoplastic growth. It is perhaps 
then more informative to carry out differential display experiments using in vivo as 
opposed to in vitro models, where uniform populations of identical cells probably 
represent a partial, skewed or even inaccurate picture of the molecular changes that 



The incidence and possible implications of inter-individual biological variation 
should be considered in any approach where whole animal models are being used. It 

r/riuli Onl Jl U K and animals > res P° nd » different ways to identical 

stimuli. One of the best characterized examples is the debrisoquine oxidation 
polymorphism, which is mediated by cytochrome CYP2D6 and determines the 
pharmacokinetics of many commonly prescribed drugs (Lennard 1993, Meyer and 
Zanger 1997). The reasons for such differences are varied and complex, but allelic 
variations, regulatory region polymorphisms and even physical and mental health 
can all contribute to observed differences in individual responses. Careful thought 
should there ore, be gl ven to the specific objectives of the study and to the possible 
value of pooling startmg material (tissue/mRNA). The effect of this can be 
beneficial through the ironing out of exaggerated responses and unimportant minor 
fluctuates of (mechanically) irrelevant genes i„ individual animals, thus 
proving a clearer overall picture of the general molecular mechanisms of the 
response. However at the same time such minor variations may be of utmost 
importance m deciding the ability of individual animals to succumb to or resist the 
effects of a given chemical /disease. 



How efficient are differential expression techniques at recovering a high percentage of 
differentially expressed genes? J 

A number of groups have produced experimental data suggesting that mam- 
malian cells produce between 8000-15 000 different mRNA spedes at any one time 

221 ^3000 ^ T \ Hed " Ck Ct ° l 1984 ' Br3V0 1990) ' ^oug/figures as 
high as 20-30000 have also been quoted (Axel et al. 1976). Hedrick et al (1984) 

provided evidence suggesting that the majority of these belong to the rare abundance 

class. A breakdown of this abundance distribution is shown in table 1 

When the results of differential display experiments have been compared with 

data obtained prev.ously using other methods, it is apparent that not all differently 

expressed mRNAs are represented in the final display. In particular, rare messages 

(which importantly, often include regulatory proteins) are not easily recovered 

using differential display systems. This is a major shortcoming, as the majority of 

mRNA species exist at levels of less than 0.005% of the total population (table 1) 

Bertiol, et al (1995) examined the efficiency of DD template's (hete^e^ 

mRNA populations) for recovering rare messages and were unable to detect mRNA 
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species present at less than 1 .2 % of the total mRNA population-equivalent to an 
-intermediate-or abamdanf species. Interestingly, when simple model systems (sinele 
target only) were used instead of a heterogeneous mRNA population, the same 
primers could detect levels of target mRNA down to 10000X smaller. These results 
are probably best explained by competition for substrates from the many PCR 
products produced in a DD reaction. 

The numbers of differentially expressed mRNAs reported in the literature using 
various model systems provides further evidence that many differential expressed 
mRNAs are not recovered. For example, DeRisi et al. (1997) used DNA array 
technology to examine gene expression in yeast following exhaustion of sugar in the 
medium and found that more than 1 700 genes showed a change in expression of at 
?1 ,f iJS \ 8 finding ' h WOuld not be treasonable to suggest that 

uoto 000"or mo ™ A PTOduCed by m * *™ 

up to 1000 or more may show altered expression following chemical stimulation. 

Whilst this may be an extreme figure, it is known that at least 100 eenes are 

rooor e T d/U L re8Ulat w " JUfl : at (T_) " 11S f ° Il0Wing IL - 2 (UllLn et al. 

1990) In addition, Wan et al. (1996) estimated that interferon- y -stimulated HeLa 
cells differentially express up to 433 genes (assuming 24000 distinct mRNAs 
expressed by the cells). However, there have been few publications documenting 
anywhere near the recovery of these numbers. For example, in using DD to compare 
normal and regenerating mouse liver, Bauer et al. (1993) found only 70 of 38000 
total bands to be different. Of these, 50% (35 genes) were shown to correspond to 
differentially expressed bands. Chen et al. (1996) reported 10 genes unregulated in 
female rat liver following ethinyl estradiol treatment. McKenzie and Drake (1997) 
identified 14 different gene products whose expression was altered by phorbo 
mynstate acetate (PMA, a tumour promoter agent) stimulation of a human 
myelomonocytic cell hne. Kilty and Vickers (1997) identified 10 different gene 
products whose expression was unregulated in the peripheral blood leukocytes of 
allergic disease sufferers. Linskens et al. (1995) found 23 genes differentially 
expressed between young and senescent fibroblasts. Techniques other than DD 
have also provided an apparent paucity of differentially expressed genes. Using SH 
for example, Cao et al. (1997) found 15 genes differentially expressed in colorectal 
cancer compared to normal mucosal epithelium. Fitzpatrick et al. (1995) isolated 17 
genes unregulated in rat live, - following treatment with the peroxisome proliferate,, 
clofibrate; Philips et al. (1990) isolated 12 cDNA clones which were upregulated in 
highly metastatic mammary adenocarcinoma cell lines compared to poorly meta- 
static ones. Prashar and Weissman (1996) used 3' restriction fragment analysis and 
identified approximately 40 genes showing altered expression within' 4 h of 
activauon of Jurkat T-cells. Groenink and Leegwater (1996) analvsed 27 gene 
fragments isolated using SSH of delayed early response phase of liver' regeneration 
and found only 12 to be upregulated. 

In the laboratory, SSH was used to isolate up to 70 candidate genes which appear 
to show altered expression in guinea pig liver following short-term treatment with 
the peroxisome proliferates WY-14,643 (Rockett, Swales, Esdaile and Gibson 
unpublished observations). However, these findings have still to be confirmed by 
ana^sis of the extracted tissue mRNA for differential expression of these sequences 
Whilst the latest deferential display technologies are purported to include design 
and experimental modifications to overcome this lack of efficiency (in both the total 
number of differentially expressed genes recovered and the percentage that are true 



678* 



positives), it is still not clear if such adaptations are practically effective— proving 
-efficiency by-spiking" -with a" known amount delimited numbers of artificial 
construct(s) is one thing, but isolating a high percentage of the rare messages already 
present in an mRNA population is another. Of course, some models will genuinely 
produce only a small number of differentially expressed genes. In addition, there are 
also technical problems that can reduce efficiency. For example, mRNAs may have 
an unusual primary structure that effectively prevents their amplification by PCR- 
based systems. In addition, it is known that under certain circumstances not all 
mRNAs have 3'polyA sites. For example, during Xenopus development, deadenyl- 
ation is used as a means to stabilize RNAs (Voeltz and Steitz 1998), whilst 
preferential deadenylation may play a role in regulating Hsp70 (and perhaps, 
therefore, other stress protein) expression in Drosophila (Dellavalle et al. 1994). The 
presence of deadenylated mRNAs would clearly reduce the efficiency of systems 
utilizing a polydT reverse transcription step. The efficiency of any system also 
depends on the quality of the starting material. All differential display techniques 
use mRNA as their target material. However, it is difficult to isolate mRNA that is 
completely free of ribosomal RNA. Even if polydT primers are used to prime first 
strand cDNA synthesis, ribosomal RNA is often transcribed to some degree 
(Clontech PCR-Select cDNA Subtraction kit user manual). It has been shown at 
least in the case of SSH, that a high rRNA.mRNA ratio can lead to inefficient 
subtractive hybridization (Clontech PCR-Select cDNA Subtraction kit user 
manual), and there is no reason to suppose that it will not do likewise in other SH 
approaches. Finally, those techniques that utilise a presubtraction amplification step 
(e.g. RDA) may present a skewed representation since some sequences amplify 
better than others. 

Of course, probably the most important consideration is the temporal factor. It 
is clear that any given differential display experiment can only interrogate a cell at 
one point in time. It may well be that a high percentage of the genes showing altered 
expression at that time are obtained. However, given that disease processes and 
responses to environmental stimuli involve dynamic cascades of signalling, 
regulation, production and action, it is clear that all those genes which are switched 
on/off at different times will not be recovered and, therefore, vital information may 
well be missed. It is, therefore, imperative to obtain as much information about the 
model system beforehand as possible, from which a strategy can be derived for 
targeting specific time points or events that are of particular interest to the 
investigator. One way of getting round this problem of single time point analysis is 
to conduct the experiment over a suitable time course which, of course, adds 
substantially to the amount of work involved. 



How sensitive are differential expression technologies ? 

There has been little published data that addresses the issue of how large the 
change in expression must be for it to permit isolation of the gene in question with 
the various differential expression technologies. Although the isolation of genes 
whose expression is changed as little as 1.5-fold has been reported using SSH 
(Groenink and Leegwater 1996), it appears that those demonstrating a change in 
excess of 5-fold are more likely to be picked up. Thus, there is a 'grey zone' 
in between where small changes could fade in and out of isolation between 
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experiments and animals. DD, on the other hand, is not subject to this grey 
"zone since, unlike SH approaches, it does not amplify the difference in expression 
between two samples. Wan et al. (1996) reported that differences in expression of 
twofold or more are detectable using DD. 

Resolution and visualization of deferential expression products 

It seems highly improbable with current technology that a gel system could be 
developed that is able to resolve all gene species showing altered expression in any 
given test system (be it SH- or DD-based). Polyacrylamide gel electrophoresis 
(PAGE) can resolve size differences down to 0.2% (Sambrook et al. 1989) and are 
used as standard in DD experiments. Even so, it is clear that a complex series of gene 
products such as those seen in a DD will contain unresolvable components. Thus, 
what appears to be one band in a gel may in fact turn out to be several. Indeed it has 
been well documented (Mathieu-Daude et al. 1996, Smith et al. 1997) that a'single 
band extracted from a DD often represents a composite of heterogeneous products, 
and the same has been found for SSH displays in this laboratory (Rockett et al. 
1997). One possible solution was offered by Mathieu-Daude et al. (1996), who 
extracted and reamplified candidate bands from a DD display and used single strand 
conformation polymorphism (SSCP) analysis to confirm which components 
represented the truly differentially expressed product. 

Many scientists often try to avoid the use of PAGE where possible because it is 
technically more demanding than agarose gel electrophoresis (AGE). Unfortunately 
high resolution agarose gels such as Metaphor (FMC, Lichfield, UK) and AquaPor 
HR (National Diagnostics, Hessle, UK), whilst easier to prepare and manipulate 
than PAGE, can only separate DNA sequences which differ in size by around 
1.5-2% (15-20 base pairs for a 1Kb fragment). Thus, SSH, RDA or other such 
products which differ m size by less than this amount are normally not resolvable 
However, a simple technique does in fact exist for increasing the resolving power of 
AGE— the inclusion of HA-red (10-phenyl neutral red-PEG ligand) or HA-yellow 
(bisbenzamide-PEG ligand) (Hanse Analytik GmbH, Bremen, Germany) in a 
gel separates identical or closely sized products on base content. Specifically 
HA-red and -yellow selectively bind to GC and AT DNA motifs, respectively 
(Wawer et al. 1995, Hanse Analytik 1997, personal communication). Since both 
HA-stains possess an overall positive charge, they migrate towards the cathode 
when an electric field is applied. This is in direct opposition to DNA, which 
is negatively charged and, therefore, migrates towards the anode. Thus,' if two 
DNA clones are identical in size (as perceived on a standard high resolution 
agarose gel), but differ in AT/GC content, inclusion of a HA -dye in the gel 
will effectively retard the migration of one of the sequences compared to the 
other, effectively making it apparently larger and, thus, providing a means of 
differentiating between the two. The use of HA-red has been shown to resolve 
sequences with an AT variation of less than 1 % (Wawer et al. 1995), whilst Hanse 
Analytik have reported that HA staining is so sensitive that in one case it was used 
to distinguish two 567bp sequences which differed by only a single point mutation 
(Hanse Analytik 1 996, personal communication). Therefore, if one wishes to check 
whether all the clones produced from a specific band in a differential display 
experiment are derived from the same gene species, a small amount of reamplified 
or digested clone can be run on a standard high resolution gel, and a second aliquot 
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Figure 10. Discrimination of clones of identical/nearly identical size using HA -red BanrU nt a 

size (1-5) were extracted from the final x;*li„. t ae U5jng «A-red. Bands of decreasing 
experiment and cloned. "^T 0 ' Subtractive hybridization' 

inserts amplified using PCR ThTproduas wer^e r^rf on two eels !2T ^ C,0n , ed band and 
gel, and (B) a high resolution 2 % a^aro^con /m HA "rS" wT 2 ° /o agar0Se 
the clones from each band appear to hlrhl^l f , / „ A " redl W,th few ««Ptions, all 
(gel B), which separates "to&?J5 ed D^J^i!? ^ ^T'* *« PrCSenCe ° f HA - ed 
the sequence, clearly mdinnSt^ZZtJ/*!? based on the percentage of GC within 

different gene species are replied ' ^ 1 aPF>C8r 10 be same si "> at fou, 

in a similar gel containing one of the HA-stains. The standard gel should indicate 
any gross sue differences, whilst the HA-stained gel should sepzrTo ZtlZ 

etll nQQTf SPeC,eS i° n St3ndard AGE) 3CCOrd1 ^ t0 their base -"tent. Gei ng 
clonK F TnT SUCC " SfUl USC ° f tHis appr ° ach for ^entifying DD-deri^ 
clones. Figure 10 shows such an experiment earned out i„ this laboratory on clones 
obtained from a band extracted from an SSH display 

An alternative approach is to carry out a 2-D analyse of the differential d 1SD lav 
products, n this approach, size-based separation is first earned out n a tanda d 
agarose gel. The gel slice containing the display is then extracted n Lo pora d 
m to a HA gel for resolution based on AT /G C content incorporated 

Of course one should always consider the possibility of there being different 
gene species which are the same size and have the same GC/AT content However 
even these species are not unresolvable given some effortl-again, one'm ^ use 
SSCP, or perhaps a denaturing gradient gel electrophoresis (DC GE or temper tu e 
grad.ent field electrophoresis (TGGE) approach to resolve the contents Tf P band 

p" 6 " " CXtraCted (SUZUkl €t ° L 1991 > or °" -IpHfieJ 

The requirement of some differential display techniques to visualize larse 
numbers of products (e.g. DD and GEF) can also present a problem in that " term 
of numbers, the resolution of PAGE rarely exceeds 300-400 bands ^appo h"o 

gels such as those de — * ^-Zfn : 
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ExtractiQUof differentially-expressed bands from a gel can be complex since, in 

some cases (e.g. DD, GEF), the results are visualized by autoradiographic means, 
such that precise overlay of the developed film on the gel must occur if the correct 
band is to be extracted for further analysis. Clearly, a misjudged extraction can 
account for many man-hours lost. This problem, and that of the use of radioisotopes, 
has been addressed by several groups. For example, Lohmann et al. (1995) 
demonstrated that silver staining can be used directly to visualize DD bands in 
horizontal PAGs. An et al. (1996) avoided the use of radioisotopes by transferring a 
small amount (20-30%) of the DNA from their DD to a nylon membrane, and 
visualizing the bands using chemiluminescent staining before going back to extract 
the remaining DNA from the gel. Chen and Peck (1996) went one step further and 
transferred the entire DD to a nylon membrane. The DNA bands were then 
visualized using a digoxigenin (DIG) system (DIG was attached to the polydT 
primers used in the differential display procedure). Differentially expressed bands 
were cut from the membrane and the DNA eluted by washing with PCR buffer prior 
to reamplification. 

One of the advantages of using techniques such as SSH and RD A is that the final 
display can be run on an agarose gel and the bands visualized with simple ethidium 
bromide staining. Whilst this approach can provide acceptable results, overstating 
with SYBR Green I or SYBR Gold nucleic acid stains (FMC) effectively enhances 
the intensity and sharpness of the bands. This greatly aids in their precise extraction 
and often reveals some faint products that may otherwise be overlooked. Whilst 
differential displays stained with SYBR Green I are better visualized using short 
wavelength UV (254 nm) rather than medium wavelength (306 nm), the shorter 
wavelength is much more DNA damaging. In practice, it takes only a few seconds 
to damage DNA extracted under 254 nm irradiation, effectively preventing 
reamplification and cloning. The best approach is to overstain with SYBR Green I 
and extract bands under a medium wavelength UV transillumination. 



The possible use of 'microfingerprinting * to reduce complexity 

Given the sheer number of gene products and the possible complexity of each 
band, an alternative approach to rapid characterization may be to use an enhanced 
analysis of a small section of a differential display— a 'sub-fingerprint' or 'micro- 
fingerprint'. In this case, one could concentrate on those bands which only appear 
in a particular chosen size region. Reducing the fingerprint in this way has at least 
two advantages. One is that it should be possible to use different gel types, 
concentrations and run times tailored exactly to that region. Currently, one might 
run products from 100-3000 + bp on the same gel, which leads to compromize in the 
gel system being used and consequently to suboptimal resolution, both in terms of 
size and numbers, and can lead to problems in the accurate excision of individual 
bands. Secondly, it may be possible to enhance resolution by using a 2-D analysis 
using a HA-stain, as described earlier. In summary, if a range of gene product sizes 
is carefully chosen to included certain ' relevant ' genes, the 2-D system standardized, 
and appropriate gene analysis used, it may be possible to develop a method for the 
early and rapid identification of compounds which have similar or widely different 
cellular effects. If the prognosis for exposure to one or more other chemicals which 
display a similar profile is already known, then one could perhaps predict similar 
effects for any new compounds which show a similar micro-fingerprint. 
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An alternative approach to microfingerprinting is to examine altered expression 
—nrspecinc families of genes through careful selection of PCR primers and /or post- 
reaction analysis. Stress genes, growth factors and/or their receptors cell cycling 
genes, cytochromes P450 and regulatory proteins might be considered as candidates 
for analysis in this way. Indeed, some off-the-shelf DNA arrays (e.g. Clontech's 
Atlas cDNA Expression Array series) already anticipated this to some degree by 
grouping together genes involved in different responses e.g. apoptosis, stress, DNA- 
damage response etc. 



Screening 

False positives 

^ ThC g !T a V° n ° f f3lse P ° sitives has been discussed at length amongst the 
differential display community (Liang et al. 1993, 1995, Nishio et al 1 994 Sun et al 
1994, Sompayrac et al. 1995). The reason for false positives varies' with the 
technique being used. For instance, in RDA, the use of adaptors which have not 
been HPLC purified can lead to the production of false positives through illegitimate 
hgation events (0 Neill and Sinclair 1997), whilst in DD they can arise through 
PCR artifacts and illegitemate transcription of rRNA. In SH, false positives appear 
cDNA /mPN ^ ar i e ^ v from abundant gene species, although some may arise from 
cDNA/mRNA species which do not undergo hybridization for technical reasons 

A quick screening of putative differentially expressed clones can be carried out 
using a simple dot blot approach, in which labelled first strand probes synthesized 

/TolTc I u VCr m , RNA h y bridized "> a " »rray of said clones (Hedrick et 
al. 1984, Sakaguchi et al. 1986). Differentially expressed clones will hybridize to 
tester probe, but not driver. The disadvantage of this approach is that rare species 
may not generate detectable hybridization signals. One option for those using SSH 
is to screen the clones using a labelled probe generated from the subtracted cDNA 
from which it was derived, and with a probe made from the reverse subtraction 
reaction (ClonTechniques 1997a). Since the SSH method enriches rare sequences 
it should be possible to confirm the presence of clones representing low abundance 
genes. Despite this quick screening step, there is still the need to go back to the 
original mRNA and confirm the altered expression using a more quantitative 
approach. Although this may be achieved using Northern blots, the sensitivity is 
poor by today s high standards and one must rely on PCR methods for accurate and 
sensitive determinations (see below). 



Sequence analysis 

The majority of differential display procedures produce final products which are 
between 100 and lOOObp in size. However, this may considerably reduce the size of 
the sequence for analysis of the DNA databases. This in turn leads to a reduced 
confidence in the result-several families of genes have members whose DNA 
sequences are almost identical except in a few key stretches, e.g. the cytochrome 
P450 gene superfamily (Nelson et al. 1 996). Thus , does the clone identified as being 
almost identical to gene X 0 really come from that gene, or its brother gene X or its 
as yet undiscovered sister X 2 ? For example, using SSH , part of a gene was isolated 
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which was up-regulated in the liver of rats exposed to Wy-14,643 and was identified 
-by-a-FASTA-search as-being transferrin (data not shown). However transferrin is 
known to be downregulated by hypolipidemic peroxisome proliferators such as Wy 
14,643 (Hertz et al. 1996), and this was confirmed with subsequent RT-PCR 
analysis. This suggests that the gene sequence isolated may belong to a gene which 
is closely related to transferrin, but is regulated by a different mechanism 

A further problem associated with SH technology is redundancy In most cases 
before SH » carried out, the cDNA population must first be simplified bv restriction 
digestion. This is important for at least two reasons : 

(1) To reduce complexity-long cDNA fragments may form complex networks 
which prevent the formation of appropriate hybrids, especially at the high 
concentrations required for efficient hybridization. 

(2) Cutting the cDNAs into small fragments provides better representation of 
individual genes. This is because genes derived from related but distinct 
members of gene families often have similar coding sequences that mav cross- 
hybndize and be eliminated during the subtraction procedure (Ko 1990) 
Furthermore, different fragments from the same cDNA may differ considerably 
m terms of hybridization and amplification and, thus, may not efficiently do one 
or the other (Wang and Brown 1991). Thus, some fragments from differentially 
expressed cDNAs may be eliminated dunng subtractive hybridization pro- 
cedures. However, other fragments may be enriched and isolated As a 
consequence of this, some genes will be cut one or more times, giving rise to two 
or more fragments of different sizes. If those same genes are differentially 
expressed, then two or more of the different size fragments mav come through 
as separate bands on the final differential display, increasmg the observed 
redundancy and increasing the number of redundant sequencing reactions. 
Sequence comparisons also throw up another important poim-at what degree 

of sequence similarity does one accept a result. Is 90% identitiv between a gene 
derived from your model species and another acceptably close ?'ls 95% between 
your sequence and one from the same species also acceptable? This problem is 
particularly relevant when the forward and reverse sequence compansons give 
similar sequences with completely different gene species! An arbitrary decision 
seems to be to allocate genes that are definite (95% and above similarity) and then 
group those between 60 and 95% as being related or possible homologues 



Quantitative analysis 

At some point, one must give consideration to the quantitative analysis of the 
candidate genes, either as a means of confirming that thev are truly differentially 
expressed, or in order to establish just what the differences are. Northern blot 
analysis is a popular approach as it is relatively easy and quick to perform However 
the major drawback with Northern blots is that they are often not sensitive enough 
to detect rare sequences. Since the majority of messages expressed in a cell are of low 
abundance (see table 1), this is a major problem. Consequently, RT-PCR may be the 
method of choice for confirming differential expression. Although the procedure is 
somewhat more complex than Northern analys.s, requiring synthesis of primers and 
optimization of reaction conditions for each gene species, it is now possible to set up 
high throughput PCR systems using mulitchannel pipettes, 96 -Kwell plates and 
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- caTcmogemceffect-Whilst differential display technology cannot h pe to answer 
these quesuons, ,t does provide a springboard from which identification, regTZ 
and funcuonal studies can be launched. Understanding the molecular mech^m of 
cellular responses* almost impossible without knowing the regulation and function 
of those genes and their condition (e.g. mutated). In an abstract senrdifftential 
display can be hkened to a still photograph, showing details of a fixed moment £ 
tnne. Consider the Historian who knows the outcome of a battle and the Temem 
and condition of the troops before the battle commenced, but is asked to tr T an d 
deduce how the battle progressed and why it ended as it di rom a stiU 
photographs-an impossible task. In order to understand the battleThe H^oriln 
must find out the capabilities and motivation of the soldiers and thiir 7 ^ 
officers, what the orders were and whether they ^^^^^2 
terram, the remains o the battle and consider the effects the prevailuigTather 
conditions exerted. Likewise, if mechanistic answers are to b I forth ZlZTrZ 
_ must use different d 1S play » combination with othe ^ e hn^" 
knockout technology, the analysis of cell signalling pathways, n «riZ£3 
t,me and dose response analyses. Although this review has e^p has zed the 
importance of differential gene profiling, it should not be conrideJS^to and 
he full .mpact of this approach will be strengthened if used in col^^t 
unctional genomics and proteomics (2-dimensional protein gels from ^eUtric 
focusing and subsequent SDS electrophoresis and virtual 2D Lps "! ™**^ 
electrophoresis). Proteomics is attracting much recent attentio as many ofZ 
changes resulting ,n differential gene expression do not involve changed" mRNA 
levels as decnbed extensively herein, but rather protein-protein protem-DNA and 
protein phosphorylation events which would' require" funct o'nal enomi^s or 
proteomic technologies for investigation. genomics or 

Despite the limitations of differential display technology it is clear tW ™ 
potential applications and benefits can be obtained from i ^ £ gene* 

changes that occur in a cell during normal and disease development and in espon e 
to chemical or biolog.ca insult. In light of functional data, such profiling w 1 
provide a fi»gerpn»t' of each stage of development or response, and T helonl 
^rm should help in the elucidation of specific and sensitive biomarkers for dfferent 
types of chem.cal/biological exposure and disease states. The potential medical and 
therapeunc benefits of understanding such molecular changes are Tlmcst im- 
measurable. Amongst other things, such fingerprint, could indicate theTam ly" 
even specific type of chemical an individual has been exposed to plus the kngth 
and/or acuteness of that exposure, thus indicating the most prudent treat ment 
They may also help uncover differences in histologically identical cancers proTde 
diagnostic tests for the earliest stages of neoplasia and, again, perhaps indicate 2e 

most efficacious treatment ^ lc L,1C 



The Human Genome Project will be completed early in the next century and the 
DNA sequence of all the human genes will be known. The continuing development 
and volution of differential gene expression technology will ensure that thi 
knowledge contributes fully to the understanding of human disease processes. 
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SUMMARY 

The technique of differential display reverse transcription-polymerase chain reaction (ddRT-PCR) has been used to produce unique 
profiles of up-regulated and down-regulated gene expression in the liver of male Wistar rats following short term exposure to the 
n n-gen toxic hepatocarcinogens, phenobarbital and WY- 14,643. Animals were treated for 3 days, whereupon their livers were 
extracted and snap frozen. mRNA was prepared from the liven and used for ddRT-PCR. Individual bands from the differential 
displays were extracted and cloned. False positives were eliminated by dotblot screening and true positives then sequenced and 
identified. 

It is now evident that non-genotoxic carcinogens 
constitute a group of chemicals which are not only di- 
vergent in their interspecies toxicity, but also demon- 
strate different target organ selectivities and mecha- 
nisms of action (3,4). Elucidation of the molecular 
mechanisms underlying non-genotoxic carcinogenesis 
is currently underway, but the picture is still far from 
complete. It is anticipated that a better understanding 
of the early changes in genetic expression following 
exposure to non-genotoxic carcinogens will aid devel- 
opment of experimental strategies to identify cellular 
markers which are diagnostic for this type of toxicity. 

Subtracuve ddRT-PCR is a recently developed 
technique which facilitates the preferential amplifica- 
tion of gene products that demonstrate altered expres- 
sion in target tissue(s) following exposure to chemical, 
stimuli. Furthermore, using this technique, no prior 
knowledge of the specific genes which are up/down 
regulated is required. In the current study, we have un- 
dertaken to develop a specific and rapid assay for non- 
genotoxic carcinogens using the technique of ddRT- 
PCR. This has allowed us to identify characteristic 
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INTRODUCTION 

Safety evaluation of new chemicals usually necessi- 
tates the examination of genotbxic and carcinogenic 
potential using short-term in vitro and in vivo geno- 
toxicity assays augmented by chronic bioassay tests. 
The short-term assays have proved useful in the early 
identification of potential genotoxic carcinogens, but 
their value is limited by observations which suggest 
that approximately 60% of chemicals identified as car- 
cinogens in life-exposure studies produce mainly 
negative findings in short-term genotoxicity tests (1,2). 
Thus, there is currently no reliable and rapid means of 
evaluating the carcinogenic risk of new chemicals 
which fall into this latter group of compounds, termed 
non-genotoxic (or epigenetic) carcinogens. 
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patterns of gene regulation following administration of 
" two different non-genotoxfc carcinogens (phenobarbi- 
tal and Wy- 14,643) and the subsequent identification 
of individual gene species which are regulated by this 
xenobiotic treatment. 



MATERIALS AND METHODS 

Animals and treatment 

Phenobarbital (BDH, Poole, UK; 100 mg/kg/day) or 
[4^hloro-6^(2,3-xylidino)-2-pyrimidinylthio] acetic 
acid (Wy- 14,643) (Campo, Emmerich; 250 
mg/kg/day) was administered by gavage to groups of 
3 male Wistar rats (150-200 g) on three consecutive 
days, whilst control animals received nothing. All ani- 
mals had free access to food (rat and mouse standard 
diet, B&K Universal, Hull, UK) and water. The ani- 
mals were killed on the fourth day, whereupon their 
livers were excised, sliced into 0.5 cm cubes, snap fro- 
zen in liquid nitrogen and then stored at -70°C. 

mRNA extraction 

Up to 0.25 g of each frozen liver sample was ground 
under liquid nitrogen using a mortar and pestle. 
mRNA was extracted from the ground liver using 
Promega's PolyATtract® System 1000 (Promega, 
. Madison. WI, USA) according to the technical man- 
ual. The mRNA was DNase-treated (Promega, final 
cone ntration 10 U/ml) before phenol/chloroform ex- 
traction and ethanol precipitation. The mRNA was re- 
suspended at a final concentration 500-1000 ng/uJ. 

ddRT-PCR 

This was carried out using the PCR-Select™ cDNA 
Subtraction Kit (Clontech, Palo Alto, CA, USA) ac- 
cording to the manufacturer's instructions. Final PCR 
reactions were run on a 2% Metaphor agarose (FMC, 
Rockland, MD, USA) gel containing ethidium bro- 
mide (Sigma, Dorset, UK) and then overstained for 30 
min with SYBR Green I DNA stain (FMC, 1:10 000 
dilution in TAE). 



the DNA eluted using a Genelute™ Agarose Spin Col- 

XT a /, UP f C0 * BeUef0W *>- An aliquot of the eluted 
DNA (5 ul) was re-amplified using the original ddRT- 
PCR nested primers and electrophoresed on a 2% 
agarose gel. The re-amplified band was extracted from 
the gel (as above) and the eluted DNA ligated directlv 
into the TOPO TA Cloning® vector (Invitrogen. 
Carlsbad) before transformation in Escherichia coli 
TOP10F One Shot™ cells (Invitrogen). 



Stage 1 screening 

Twelve transformed (white) colonies from each band 
were grown up for 6 h in 200 ul LB broth containing 
ampicillin (Sigma, 50 ug/ml) and 1 ul of this ampli- 
fied by PCR reaction (as specified in ddRT-PCR tech- 
mcal manual). One quarter of the completed reaction 
was electrophoresed on a standard 2% agarose gel and 
one quarter on a 2% agarose gel containing HA Yel- 
ow (Hanse Analytik GmbH, Bremen, Germany. 1 
U/ul) to discern the different cloning products The re- 
mainder was used to prepare duplicate dotblots on Hy- 
bond N+ (nylon) membranes (Amersham. Little Chal- 
font. UK). Cultures containing different cloning prod- 
ucts were grown up and a plasmid miniprep prepared 
from each (Wizard Plus SV Minipreps DNA Purifica- 
tion System. Promega) according to the manufac- 
turer s instructions. 



Stage II screening 

The duplicate dotblots were probed with: (a) the final 
differential display reaction; and (b) the •reverse-sub- 
tracted' differential display reaction. To make the 're- 
verse-subtracted' probe, the subtractive hybridisation 
step of the ddRT-PCR procedure was carried out using 
the ong.nal tester cDNA as a driver and the driver as 
a tester. Probing and visualisation were carried out us- 
ing the ECL Direci Nucleic Acid Labelling and Detec- 
tion System (Amersham) according to the manufac- 
turer's instructions. Those clones which were positive 
for (a) but negative for (b), or showed a substantially 
larger positive signal with (a) compared to (b). were 
chosen for further analysis. 



Band extraction and cloning 

Each discernible band from the differential display 
pattern was extracted from the gel with a scalpel and 



DNA sequencing 

Positive clones as identified above were sequenced on 
an automated ABI DNA sequencer (Applied Biosys- 
terns, Warrington, UK). 
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Wy.14-643 treatment; ^.f-JV^t^ g££ 
phenobarbetal treatment; and lane 6 Ikb ladder. (B) Subtracts ddRT-PCR patterns obtained' from rt^vS^ «3 
change* when phenobartatd treated mRNA is subtracted from Wy-14.643-treated mRNA and v?ce vmf Une 1 1 S 
adder, lane 2, genes showmg .ncreased expression following Wy-14.643 treatment compared ,o p^^^^ 
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Fig. 2 : Re-amplified ddRT-PCR products which were down-regulated following phenobarbital treatment (upregulated bands w re also 
re-ampbfied but gel n t shown). Individual DNA bands excised from gel of ddRTR-PCR reactions wot extracted 
°" 8gar0Se 8 10 C nf ' nn amp,iflCa,i0n ° f «"« band < numb «*>- &e MaterialTand MefcSf for 
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Band number (Fig. 2) 
(Approximate size in bp) 



H ighest sequence homology 



FASTA-EMBL gene identtfkatian 

Rat mRNA for 3-oxoacyf-CoA thiolase 
Rat hemopoxin mRNA 
ft rattus alpha-2u-globulin mRNA 
M. musculus mRNA for CI inhibitor 
Rat electron transfer flavoprotein 
Mouse topoisomerase 1 (Topo 1) mRNA 
Soares 2NbMT M, musculus (EST) 
Rat aipha-2u-globulin (s-type) mRNA 
Soares mouse NML M. musculus (EST) 
Soares P3NMF19.5 M.musculus (EST) 
Soares mouse NML M. musculus (EST) 
NCLCGAP_Pr1 H. sapiens (EST) 
ft norvegicus mRNA for ribosomal protein 
Soares mouse embryo NbMEl35 (EST) 
Rat fibrinogen B-beta-chain 
Rat apolipoprotein E gene 
Soares p3NMF19.5 M. musculus (EST) 
Stratagene mouse testis (EST) 
ft norvegicus RASP 1 mRNA 
Soares mouse mammary gland (EST) 

EST = expressed sequence lag. 
Ban* 4-6 were shown u> be false positives by dotblo. analysis and. therefore, no. sequenced. 
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11 (525) 




95.7% 


12 (375) 




100.0% 


13 (230) 


Clone 1 


97.2% 




Clone 2 


100.0% 




Clone 3 


100.0% 


14 (170) 




96.0% 


15 (140) 




97.3% 


rs: (300) 




96.7% 


(275) 




93.1% 



TabU 11 : Rat liver genes up-regulated by phenobarbital treatment 



Band number 
(Approximate size in bp) 

5(1300) 
7(1000) 

8(950) 
10(850) 

11 (800) 

12 (750) 
15(600) 
16(550) 
21 (350) 



Ph enobarbital up-regulated 
H igh estsequen ce homology 



Clone 1 
Clone 2 



Clone 1 
Clone 2 



93.5% 
95.1% 

98.3% 
95.7% 
94.9% 
75.3% 
93.8% 

92.9% 

95.2% 
93.6% 
99.3% 



FASTA-EMBL gene identification 

Rat cytochrome P450IIB1 
mRNA for rat preproalbumin 
Rat serum albumin mRNA 
NCI_CGAP_Pr1 H. sapiens (EST) 
Rat cytochrome P450IIB1 
Rat cytochrome P450IIB1 
Rat cytochrome p450-L (p450IIB2) 
Rat TRPM-2 mRNA 
Rat mRNA for sulfated glycoprotein 
mRNA for rat preproalbumin 
Rat serum albumin mRNA 
Rat cytochr me P450IIB1 
Rat haptoglobin mRNA partial alpha 
ft norvegicus genes for 18S, 5.8S & 28S rRNA 



e • * a ^ r\ * i EST s expressed sequence tag. 

BimdS ^ 6 ' 9 ' "• U - ,7 - 20 *« » * «- P-W« ■* doSo, anaJysis and. therefore, no, seooenced. 
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Identification of differentiaJJ y.regulated 
genes 

Gene-sequences were identified using the FASTA pro- 
gramme (http^/www.ebi.ac.uk/htbin/fasta.py?request) 
to search all EMBL databases for matching DNA se- 
quences. 



RESULTS 

Figure 1A,B shows the ddRT-PCR patterns of genes 
showing altered expression in rat liver following 3 day 
treatment with phenobarbital or Wy-14,643. Individual 
bands were isolated from the phenobarbitaJ-modulated 
patterns (both up- and down-regulated), re-amplified 
(Fig. 2), cloned, screened for false positives and then 
identified. Those xenobiotic-modulated gene products 
identified to date are listed in Tables 1 and n. 



DISCUSSION 

The advent of combinatorial chemistry has led to the 
synthesis of millions of new chemical compounds, 
many of which may be potentially useful in pharma- 
ceutical, agricultural or industrial applications. How- 
ever, whilst there are tests available for those posing a 
genotoxic activity, there remains no short-term assay 
able to identify those chemicals which may belong to 
the non-genotoxic group of carcinogens. 

We have used an adaptation of the subtractive hy- 
bridisation method - ddRT-PCR - to produce charac- 
teristic profiles or 'fingerprints' of those genes which 
are up-regulated or down-regulated in male rat liver 
following acute exposure to test chemicals. The ddRT- 
PCR profiles are characteristic and unique for each of 
the 2 compounds studied to date. 

A number of those gene species showing altered 
expression following phenobarbital treatment have 
been cloned and identified (Tables I & U). It is inter- 
esting to note the presence of CYP2B2 in the up-regu- 
lated genes. This would, of course, be expected fol- 
lowing exposure to phenobarbital and serves as a posi- 
tive control for the method. Other genes which one 
might normally expect to be up-regulated do not ap- 
pear in the table. Howev r, it should be noted that not 
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all bands seen on the differential display were ex- 
tracted and re-amplified due to their being too faint or 
too close to other bands to accurately excise. Further- 
more, it has been well documented [(5) and references 
therein] that a single band extracted from a differential 
display often represents a composite of heterogeneous 
products. We are currently examining new methods to- 
(i) improve resolution of the differential display pat- 
terns (including 2-D agarose gels); and (ii) distinguish 
those ddRT-PCR products which are identical in size, 
but different in sequence. 

Our future efforts will be directed towards deter- 
mining the extent of modulation of a number of the 
genes reported herein using semi-quantitative RT- 
PCR. This should reveal the extent of changes in ex- 
pression of key gene products which may be involved 
m non-genotoxic hepatocarcinogenesis and thus help 
increase understanding of this process. Furthermore it 
is anticipated that aligning ddRT-PCR profiles of dif- 
ferent non-genotoxic agents found in responsive and 
non-responsive species may enable identification of 
those genes which are mechanistically relevant to the 
non-genotoxic hepatocarcinogenic process. Accord- 
ingly, this approach lends itself well to the identifica- 
tion, characterisation and sub-classification of possible 
different classes of non-genotoxic carcinogens. 
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Use of suppression-PCR subtractive hybridisation to identify 
genes that demonstrate altered expression in male rat and 
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Abstract 

Understanding the genetic profile of a cell at all stages of normal and carcinogenic development should provide an 
essential aid to developing new strategies for the prevention, early detection, diagnosis and treatment of cancers. We 
have attempted to identify some of the genes that may be involved in peroxisome-proliferator (PP)-induced 
non-genotoxic hepatocarcinogenesis using suppression PCR subtractive hybridisation (SSH). Wistar rats (male) were 
chosen as a representative susceptible species and Duncan-Hartley guinea pigs (male) as a resistant species to the 
hepatocarcinogenic effects of the PP, [4-chIoro-6-(2,3-xylidino)-2-pyrimidinylthio] acetic acid (Wy-14,643). In each 
case, groups of four test animals were administered a single dose of Wy-14,643 (250 mg/kg per day in corn oil) by 
gastric intubation for 3 consecutive days. The control animals received corn oil only. On the fourth day the animals 
were killed and liver mRNA extracted. SSH was carried out using mRNA extracted from the rat and guinea pig 
livers, and used to isolate genes that were up and downregulated following Wy-14,643 treatment. These genes 
included some predictable (and hence positive control) species such as CYP4A1 and CYP2C11 (upregulated and 
downregulated in rat liver, respectively). Several genes that may be implicated in hepatocarcinogenesis have also been 
identified, as have some unidentified species. This work thus provides a starting point for developing a molecular 
profile of the early effects of a non-genotoxic carcinogen in sensitive and resistant species that could ultimately lead 
to a short-term assay for this type of toxicity. © 2000 Elsevier Science Ireland Ltd. All rights reserved. 

Keywords: Wy-14,643; Peroxisome proliferator; Non-genotoxic hepatocarcinogenesis; Suppression PCR subtractive hybridisation- 
RT-PCR; Rat; Guinea pig; Gene regulation; Differential gene display; Gene profiling 
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Introduction 

The advent of combinatorial chemistry and 

>mputer-aided drug design has led to a recent 
psurge in the number of chemical compounds 
tat have potential therapeutic, agricultural and 
idustrial applications. Although it has been sug- 
;sted that the contribution of synthetic chemicals 
» the overall incidence of human cancer is low, 
tere still remains an absolute requirement to 
/aluate all new chemicals for toxic and carcino- 
;nic potential. The latter is one of the most 
roblematic areas of chemical safety evaluation 
id is usually carried out using short-term in vitro 
id in vivo genotoxicity assays augmented by 
ironic bioassay tests. The short-term assays have 
oved useful in the early identification of poten- 
il genotoxic carcinogens, but their value is lim- 
;d by observations that suggest that 
'proximately 60% of chemicals identified as car- 
logens in life-exposure studies produce mainly 
gative findings in short-term genotoxcity tests 
shby, 1992; Parodi, 1992). Thus, there is cur- 
itly no reliable and rapid means of evaluating 
z carcinogenic risk of new chemicals that fall 
o this latter group of compounds, termed non- 
notoxic (or epigenetic) carcinogens. 
Dne approach to addressing this problem is to 
cidate the molecular mechanisms by which 
own non-genotoxic carcinogens act. It should 
:n be possible to identify common factors/ 
chanisms that can serve as early biomarkers of 
cinogenic potential for new chemicals. To this 
1, a large number of groups have reported on 

various effects of non-genotoxic compounds 
various animal species (Marsman et al., 1988; 
<e et al., 1993; Cattley et al., 1994; Hayashi et 

1994; Human and Experimental Toxicology, 
'4; Anderson et al., 1996). However, the mech- 
stic picture is still far from complete with many 
those genes involved in the carcinogenic pro- 
y remaining unknown, and their identification 
<-efore remains a key goal in elucidating the 
lecular mechanisms by which non-genotoxic 
:inogenesis occurs. 

ubtractive hybridisation (SH) and related tech- 
)gies such as representational difference analy- 
(RDA) (Hubank and Schatz, 1994) and 
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differentia] display (DD) "{Liang and Pardee 
1992) can be used to aid the isolation of genes' 
showing altered expression in target tissues fol- 
lowing exposure to a chemical stimulus. These 
techniques can also be used to identify differential 
gene expression in neoplastic and normal cells 
(Liang et al., 1992), infected and normal cells 
(Duguid and Dinauer, 1990), differentiated and 
undifferentiated cells (Sargent and Dawid, 1983- 
Guimaraes et al., 1995), activated and dormant 
cells (Gurskaya et al., 1996; Wan et al., 1996) 
different cell types (Hedrick et al., 1984; Davis et 
al., 1984) amongst others. Most importantly, us- 
ing such approaches, no prior knowledge of the 
specific genes that are upregulated/downregulated 
is required. 

Using a variation of SH, termed suppression- 
PCR subtractive hybridisation (SSH) (Diatchenko 
et al., 1996), we have previously reported the 
isolation of a number of genes showing altered 
expression in male rat liver following acute expo- 
sure to phenobarbital (Rockett et al., 1997). In 
the current work we have used the same experi- 
mental approach to isolate genes that are differen- 
tially expressed in the livers of male rats and 
guinea pigs following short-term (3-day) exposure 
to the peroxisome proliferator (PP) and non- 
genotoxic hepatocarcinogen, Wy- 14,643. We have 
isolated and identified a number of gene species, 
some of which may be important in the induction 
of, or protection against, non-genotoxic 
hepatocarcinogenesis. 



2. Materials and methods 

2.1. Animals and treatment 

All animal experiments were undertaken in ac- 
cordance with Her Majesty's Home Office De- 
partment guidelines under the auspices of 
approved personal and project licences. Male 
Wistar rats (150-200 g) and male Duncan-Hart- 
ley guinea pigs (250-300 g) were obtained from 
Kingman and Bantam (Hull, UK). Upon receipt, 
both groups were randomly assigned into two 
groups of four. They were maintained on a rat, 
mouse or guinea pig standard diet (B&K Univer- 
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.sal, Hull) and a daily cycle of alternating 12-h 
_i periods qf_dark_and light- The. room, temperature 
was maintained at 19°C and a relative humidity of 
55%. The animals were acclimatised to this envi- 
. ronment for 7 days before treatment commenced. 
[4-chloro-6-(2,3-xylidino)-2-pyrimidinylthio] acetic 
acid (Wy-14,643, Campo, Emmerich; 250 mg/kg 
per day in corn oil) was administered by gavage 
to the treated groups of rats and guinea pigs on 3 
consecutive days, whilst control groups received 
an equal volume of corn oil only. During this 
time, all animals had free access to food and 
water. The animals were killed by cervical disloca- 
tion on the fourth day, and their livers immedi- 
ately excised, weighed, sliced into approximately 
0.5-cm cubes, snap frozen in liquid nitrogen and 
stored at - 70°C. 

2.2. mRNA extraction 

Approximately 0.25 g of each frozen liver sam- 
ple was ground under liquid nitrogen using a 
mortar and pestle. Messenger RNA was extracted 
from the ground liver using the PolyATtract® 
System 1000 kit (Promega, Madison, USA) ac- 
cording to the technical manual provided by the 
manufacturers. The mRNA was DNase-treated 
(RQ Rnase-free Dnase, Promega, final concentra- 
tion 10 U/ml) before phenol/chloroform extrac- 
tion and ethanol precipitation. The mRNA was 
redissolved at a final concentration 500-1000 ng/ 
Hi. 

2.3. cDNA Subtraction 

This was carried out using the PCR-Select™ 
cDNA Subtraction Kit (Clontech, Palo Alto, 
USA) according to the manufacturer's instruc- 
tions. Subtractions were carried out with mRNAs 
derived from single animals. The mRNA from the 
remaining three animals in each group was later 
used for quantitative RT-PCR analysis of specific 
genes. 

2.4. Band extraction and cloning 

The secondary PCR reactions from the cDNA 
subtraction procedure were run on a 2% 
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Metaphor agarose gel (FMC, Rockland USA) 
containing- 0.5 ug/ml ethidium bromide (Sigma 
Dorset, UK). One timeirTAE (0.04 M Tris-ac' 
etate, 0.001 M EDTA) was used to prepare the gel 
and as the running buffer. After running for 6-7 
h at 3.75 V/cm, the gel was overstated for 30 min 
with SYBR Green I DNA stain (FMC, 1 10000 
dilution in 1 x TAE). Each discernible band of 
the differential display pattern was extracted from 
the gel with a scalpel and the DNA eluted using a 
Genelute™ agarose spin column (Supelco, Belle- 
fonte, USA). Five microlitres of the eluted DNA 
was reamplified using the original nested (sec- 
ondary) PCR primers supplied with the PCR-Se- 
lect™ cDNA subtraction kit. The PCR products 
were electrophoresed on a 2% standard agarose 
gel (Boehnnger Mannheim, East Sussex, UK) and 
the reamplified target bands extracted from the 
gel as above. The eluted DNA was immediately 
hgated into a TOPO TA Cloning® vector (Invitro- 
gen, Carlsbad, USA) before transformation in 
Escherichia coli TOPI OF' One Shot™ cells 
(Invitrogen). 

2.5. Colony screening 

2.5.1. Stage I 

Eight transformed (white) colonies from each 
band were grown up for 6 h in 200 ul LB broth 
containing ampicillin (Sigma, 50 mg/ml). One mi- 
crolitre of this was subjected to PCR using the 
same conditions and nested primers as described 
above. One tenth (2 ul) of the completed PCR 
reaction was electrophoresed on a 2% standard 
agarose gel and one tenth on a 2% standard 
agarose gel containing HA red (Hanse Analytik 
GmbH, Bremen, Germany, 1 U/ml) to discern the 
differentially cloned products. The remainder of 
the PCR reaction was used to prepare duplicate 
dotblots on Hybond N + membranes (Amersham 
Little Chalfont, UK). 

2.5.2. Stage II 

The duplicate dotblots were probed with (a) the 
final differential display reaction and (b) the. 're- 
verse-subtracted' differential display reaction. To 
make the 'reverse-subtracted' probe, the subtrac- 
tive hybridisation step of the differential display 
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RT-PCR procedure was carried out using the 
original tester (treated) mRNA as the driver and 
he original driver (control) mRNA as the tester, 
^robing and visualisation were carried out using 
he ECL direct nucleic acid labelling and detec- 
ion system (Amersham, Little Chalfont, UK) ac- 
;ording to the manufacturer's instructions. Those 
:lones that were positive for (a) but negative for 
b), or showed a substantially larger positive sig- 
nal with (a) compared to (b), were selected for 
)NA sequence analysis. 

} .6. DNA sequencing 

The remainder of the cultures (prepared in 
tage I screening) containing different cloning 
roducts (as discerned in the two screening steps) 
/ere grown up overnight in 5 ml LB broth con- 
lining ampicillin (50 mg/ml). A plasmid miniprep 
as prepared from each (Wizard Plus SV 
linipreps DNA purification system, Promega) 
:cording to the manufacturer's instructions. The 
oned inserts were sequenced on an automated 
31 DNA sequencer (Applied Biosystems, War- 
ngton, UK) using the M13 forward primer 
5TAAAACGACGGCCAGT) or M13 reverse 
rimer (AACAGCTATGACCATG). 

7. Identification of differentially regulated genes 

Gene sequences thus obtained were identified 
;ing the FASTA 3.0 programme (Lipman and 
-arson, 1985; Pearson and Lipman, 1988) (http:/ 
'ww.ddbj.nig.ac.jp/E-mail/homology.html) to 
arch all EMBL databases for matching DNA 
quences. Each clone sequence was submitted in 
e forward and reverse direction, and the one 
:urning the highest statistical probability of 
uch to a known sequence was noted. Sequence 
mologies between our submitted clone sequence 
d the queried database sequence were deter- 
ned (by FASTA) over a region of at least 60 
se pairs. 

\ RT-PCR analysis of selected candidate genes 

:DNA sequences of the target genes were ob- 
ned from the NIH gene database (GenBank at 
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http://www.ncbi.nlra.nih.g<5vyWeb/Search/index. 
html) and the computer programme gene 
jockey (BioSoft, Cambridge, UK) used to select 
primer pairs from these sequences. Where guinea 
pig sequences were available, rat and guinea pig 
sequences were aligned and primers chosen from 
regions of homology. If guinea pig sequences were 
not available, rat and human sequences were 
used. In cases where exact homology could not be 
found, the sequence from the rat was used. In the 
case of CD81 only, no rat or guinea pig sequences 
were available and so mouse and human se- 
quences were aligned and a primer pair chosen 
from a region of homology. Primers (obtained 
from Gibco-BRL, Paisley, UK) were dissolved at 
a concentration of 50 pmol/^il in sterile distilled 
water and stored at -20°C. The primer pairs 
used plus other reaction parameters are shown in 
Table 1. mRNA was extracted (as described 
above) from all four treated animals and from 
three animals in the control group. Integrity of 
the eluted mRNA was confirmed on a 2% agarose 
gel, and the concentration and purity were mea- 
sured using a Genequant II spectrophotometer 
(LKB, Bromma, Sweden) and then diluted to 10 
ng/jil. One microlitre of this latter solution was 
used per RT-PCR reaction. 

RT-PCR was carried out in a single tube (50 
reaction using the Access RT-PCR system 
(Promega) according to manufacturer's instruc- 
tions. In the kinetic and quantitative analyses, 
omission of RNA was used as a control for the 
presence of any contaminating DNA. After ob- 
taining a PCR signal of the correct size and 
optimising the reaction conditions, each PCR 
product was digested with between two and four 
separate restriction enzymes. Specific restriction 
patterns were thus obtained, which further confi- 
rmed the identity of the PCR products as being 
the original target genes. Kinetic analysis (14-32 
cycles) was then performed in each case to deter- 
mine the location of the mid-log phase. 

For the semi-quantitative analysis of each 
target gene, RT-PCR reactions were carried out in 
triplicate for each sample to reduce the effect of 
intertube RT-reaction variations (Kolls et ai., 
1993) and pipetting errors. For each gene, a mas- 
termix containing enough reagents for three times 
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the number of samples (seven for rat, six for 
guinea pig) was prepared except that mRNA was 
omitted, the latter being added after aliquoting 49 
Hi of the mastermix into an appropriate number 
of tubes. Amplification of albumin (the reference 
gene) was carried out in separate tubes since the 
mid-log phase of this gene is at a much lower 
cycle number than the target genes due to its high 
abundance. All RT-PCR products were analysed 
on 2% agarose gels containing 0.5 ^g/ml ethidium 
bromide. The target gene samples were loaded on 
the gel first and run in at 3 V/cm for 10 min. The 
corresponding albumin samples were then loaded 
and the gel run for a further 1/2 h. In this way, all 



RT-PCR products from-each target gene and 
albumin from the corresponding samples could be 
run on the same gel. Gels were photographed 
using type 665 posi-neg film (Sigma) and quanti- 
tation of the band intensity was carried out using 
a dual wavelength flying spot laser scanner densit- 
ometer (Shimadzu). 

2 9. Statistical analysis 

Statistical analysis of unpaired samples was car- 
ried out using the two-tailed Student's /-test. Val- 
ues were considered statistically sienificant at 
P < 0.05 or less. 



L 1 2 
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3. Results 
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~ig. 1. Final displays of differentially expressed genes that 
vere (1) upregulated and (2) downregulated in rat (A) and 
:uinea pig (B) livers following 3-day treatment with Wy- 
4.643. mRNA extracted from control and treated livers was 
■sed to generate the differential displays using the PCR-Select 
DNA subtraction kit (Clontech). Lane (L) is a 1 Kb DNA 
adder standard and 10 ul f secondary PCR reaction were 
>aded in all other lanes. 



3.1. Cloning and screening of transcripts 

For both the rat and guinea pig experimental 
groups, cDNA subtraction was carried out in the 
forward (control driving tester) and reverse (tester 
driving control) directions to isolate both upregu- 
lated and downregulated mRNA species respec- 
tively. Using a standard primary hybridisation 
time of 8 h we obtained a substantial amount of 
non-specific products in all the final differential 
displays (data not shown). This background 
smearing was almost completely removed by re- 
ducing the primary hybridisation time to 4 h 
(CLONTECHniques, 1996). Fig. 1 shows the 
ddRT-PCR patterns of genes showing altered ex- 
pression in rat and guinea pig liver following 
3-day treatment with Wy- 14,643. The profiles are 
unique for each species, and in each case the 
profile for the upregulated genes (control mRNA 
driving tester mRNA) is different to that obtained 
for the downregulated genes (tester mRNA driv- 
ing control mRNA). 

The practical outcome of the SSH method is 
that a series of differentially expressed genes is 
observed as a ladder on an agarose gel. The 
majority of these gene fragments fall within the 
150-2000 bp range, with bands up to 5 Kbp 
occasionally being observed. Each band may the- 
oretically consist of one or more products of 
similar size, as the gel has a maximum resolution 
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Fig. 2. Discrimination of different ddRT-PCR products having 
the same molecular, size using HA-red. GeJ (A) is a 2% 
standard agarose gel. Gel (B) is a 2% standard agarose gel 
containing 1 U/ml HA-red. Band numbers refer to the sequen- 
tial bands (largest to smallest) extracted from the original 
display of genes upregulated in rat liver following 3-day treat- 
ment with Wy-14,643. Ten micorlitres of each PCR reaction 
were loaded per lane. 



of approximately 1.5% (3 bp per 200). In addi- 
tion, there may be two or more products that are 
the same size, but have a different sequence. 



Therefore some form of discrimination must be 
employed to isolate as many of these products as 
possible" HA-red screening (Geisinger et al M 1997) 
of a number of clones derived from each band 
provided a means to discriminate between differ- 
ent gene species of the same size. A typical exam- 
ple of such a gel is shown in Fig. 2. In total, 88 
and 48 apparently different clones were obtained 
from the final differential expression patterns of 
upregulated and downregulated rat genes, respec- 
tively. Sixty nine and 89 apparently different 
clones were obtained from the final differential 
expression patterns of the upregulated and down- 
regulated guinea pig genes, respectively. 

Having identified as many different candidate 
gene products as possible in the screening step I, a 
second screening step was carried out on every 
clone to confirm those that represented true dif- 
ferentially expressed genes. This is necessary since 
no subtraction technique is 100% efficient. The 
approach we used, termed PCR-select differential 
screening (as recommended in Clontech's PCR-se- 
lect cDNA subtraction kit protocol), utilises the 
forward and reverse subtractions as an aid to 
screening for the true differentially expressed 
genes (CLONTECHniques, 1997). Because these 
probes have already undergone subtraction, they 
have been enriched for differentially expressed 
genes and are therefore more sensitive than un- 
subtracted driver/tester cDNA probes for detect- 
ing true differential expression. All the clones that 
were isolated from each display were dotblotted 
and probed with the display from which they was 
obtained, plus the corresponding reverse-sub- 
tracted display. An example of such a blot is 
shown in Fig. 3. Clones corresponding to authen- 
tic differentially expressed mRNAs hybridised 
with the subtracted cDNA probe, but not the 
reverse-subtracted probe. We also included in the 
authentic positives, those clones that gave a sub- 
stantially greater signal with the subtracted probe 
compared to the reverse-subtracted probe. False 
positives hybridised with either both probes or 
with neither probe. Of the original 88 upregulated 
and 48 downregulated rat clones selected for this 
screening step, 28 (32%) and 15 (31%) respec- 
tively, were found to be true positives. In the rat, 
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8 (100%) of the true positive upregulated genes 
Table 2) and 11 (73%) of the true positive down- 
egulated genes (Table 3) were non-redundant. Of 
ie original 69 upregulated and 89 downregulated 
uinea pig clones selected for this screening step, 
8 (70%) and 37 (42%) respectively, were found to 
e true positives. Thirty six (75%) of the upregu- 
ied genes (Table 4) and 33 (89%) of the down- 
'gulated genes (Table 5) were non-redundant. 

2. Identification of clones 

On sequence analysis it was found that some 
ones were unsequencable in the first instance 
113 forward primer) due to long polyA runs 
at appeared to prematurely terminate the se- 
tenting reaction. These clones were therefore 
sequenced from the opposite direction using the 
13 reverse primer. Those xenobiotic-modulated 
ne products identified to date are listed in Ta- 
ts 2 and 3 (rat) and Tables 4 and 5 (guinea pig). 




i. Dot bl ts of clones of putative upregulated gene species 
ed from guinea pig liver following 3-day treatment with 
4,643. All clones identified in the stage I screening step 
nethods) were blotted and pr bed with (A) the differed 
iisplay from which they originated (control driving 
d) and (B) the reverse subtraction (treated driving con- 
Arrows indicate some of the true differentially expressed 



Table 2 — 

Idenufication of genes that were upregulated in male rat liver 
following 3-day treatment with WY-14,643- 



FASTA-EMBL gene 
identification (rat un- 
less otherwise stated) 



Accession No. 



Sequence 
homology* 



Carnitine octanoyl 

transferase 
NCI_CGAP_Lil {H. 

sapiens) (EST*) 
Peroxisomal enoy] 
hydratase-Iike 
protein 
Liver fatty acid bind- 
ing protein 
Soares mouse 
P3NMF19.5 M. 
musculus cDNA 
clone 
Cytochrome 
P450IVA1 
Mit. 3-hydroxyl-3- 
methylglutaryl 
CoA synthase 
Rabgeranylgeranyl 
transferase compo- 
nent B 
Genes for 18S, 5.8S, 
and 28S ribosomal 
RNAs 
Carnitine acetyl 

transferase (mouse) 
Soares mouse NML 

(EST) 
Bone marrow stromal 
fibroblast (H. sapi- 
ens) cDNA clone 
HBMSF2E4 (EST) 
7.5dpc embryo 

(mouse) (EST) 
Alpha- 1 -macroglobu 
lin 

Transferrin 
Lecithinxholesterol 

acyltransferase 
2n-oc2-glycoprotein 
Serum albumin 
Fructose- 1,6-bisphos- 

phate l-phospho- 

hydrolase 
Soares mouse 

melanoma (EST) 

(S c ) 
Soares mouse 

3NbMS (EST) 

(AS C ) 



RN26033 

HS1275949 

RN08976 

V01 235 
AA038051 



RNRRNA 



AA408192 

RNALPH1M 

RNTRANSA 
RNU62803 

RN2A2GA 
RNJALBM 
RNFBP 

AA 124706 

AA 154039 



99 
98 
98 

96 
96 



RNCYPLA 94 
RNHMGCOA 94 

RNRABGERA 94 



94 



MMRNACAR 92 
MM1157113 92 
AA545726 92 



92 

91 

91 
90 

90 
89 
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Table 2 (Continued) 



FASTA-EMBL gene Accession No. Sequence 
identification (rat un- homology* (%) 

less otherwise stated) 



17-p-hydroxsteroid de- RN17BHDT2 87 

hydrogenase 
Soares mouse AA038051 87 

p3NMF19.5 (EST) 
Peroxisomal enoyl- RNPECOA 85 

CoA:hydratase -3- 

hydroxyacyl CoA 

Afunctional enzyme 
Integral membrane S45012 81 

protein, TAP A- 1 

(CD81) (mouse) 
Soares m use lymph MMAA88445 81 

n de (EST) 
H. sapiens (clone L4040I 76 

zapl28) mRNA 
Lysophospholipase ho- HSU67963 76 

rnologue (human) 
Soares m use lymph AA2J7044 74 

node (EST) 



a Refers to the nucleotide sequence homology between the 
cloned band isolated from the differential display and the 
corresponding gene derived from the EMBL gene sequence 
bank. 

b EST is 'expressed sequence tag' — a gene of as yet 
unknown identity and function. 

c Where sequence homologies were equal in both directions 
of the isolated band, both the sense (S) and antisense (A) 
identities are given. 



In all cases, both the forward and reverse se- 
quence of the target clones were analysed and the 
gene having the highest statistical homology 
noted. 

33. RT-PCR analysis of selected clones 

The results of a typical RT-PCR semi-quantita- 
tion experiment for transferrin in the rat is given 
in Fig. 4 and the results for a total of 12 selected 
genes in both the rat and guinea pig are shown in 
Table 6. 
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Table 3 

Identification of genes that were downregulated in male 
liver following 3-day treatment with Wy-14,643 

FAST-EMBL gene Accession No. Sequence 
identification (rat un- horn 1 gy» (%) 

less otherwise stated) 



NCI_CGAP_Lil (H. 


AA484528 


99 


sapiens) (EST^S 0 ) 






NCI_CGAP_Prl (H. 


AA469320 


99 


sapiens) (EST)(AS C ) 






UDP-glucuronosyl- 


RN06273 


98 


transferase 






(UGT2B12) 






Complement compo- 


RNC3 


96 


nent c3 






Soares mouse pla- 


AA023305 


96 


centa (S) 






Ape (chimpanzee) 28S 


PTRGMC 


96 


rRNA (AS) 






Rat CYP2C1 1 


RNCYPM1 


95 


Ribosomal protein S5 


RNRPS5 


94 


Transthyretin 


RNTTHY 


94 


Contrapsin-like 


RNCCP23 


89 


protease inhibitor 






Prostaglandin F2a (S) 


RN26663 


84 


(3-2-microglobulin 


RNB2MR 


84 


(AS) 






Apolipoprotein C-III 


RNAPOA02 


82 


Parathymosin-alpha 


RN11ZNBP 


75 



(zinc2 + -binding 
protein) 



" Refers to the nucleotide sequence homology between the 
cloned band isolated from the differential display and the 
corresponding gene derived from the EMBL gene sequence 
bank. 

b EST is 'expressed sequence tag* — a gene of as yet 
unknown identity and function. 

c Where sequence homologies were equal in both directions, 
both the sense (S) and antisense (A) identities are given. 



4. Discussion 

It is now apparent that all cancers arise from 
accumulated genetic changes within the cell. Al- 
though documenting and explaining these changes 
presents a formidable obstacle to understanding 
the different mechanisms of carcinogenesis, the 
experimental methodology is now available to 
begin attempting this difficult challenge. In order 
to begin the elucidation of the molecular mecha- 
nisms involved in non-genotoxic hepatocarcino- 
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enesis, we have used SSH to identify a number of 
enes that are upregulated or downregulated in 
lale rat and guinea pig livers following short 
:rm exposure to the PP, Wy- 14,643. We have 
sed the rat model to represent a species suscepti- 
le to the non-genotoxic carcinogenic effect of 
Ps and the guinea pig as a resistant species 
Drton et ah, 1984; Rodricks and Turnbull, 1987; 



Lake et al., 1989; Makowska et al., 1992; Lake et 
ah, 1993). 

Gurskaya et al. (1996), who originally devel- 
oped the SSH technique, cloned the products of 
the secondary PCR reaction and screened a small 
number of randomly selected colonies for differ- 
entially expressed clones using northern hybridisa- 
tion. However, we decided against this approach 



able 4 

lentification of genes that were upregulated in male guinea pig liver following 3-day treatment with WY- 14,643 

ASTA-EMBL gene identification (guinea pig unless otherwise stated) Accession No. Sequence 

homology* 1 (%) 



arboxylesterase 


ABO 10634 


07 


Dmplement C3 protein (GPC3) 


M 34054 


y / 


ytos lie aldehyde dehydrogenase (sheep) 


U12761 




italase (human) 


X04076 




itoch ndrial aspartate aminotransferase (pig) 


Ml 1732 


oy 

00 


ongation factor- 1 -alpha (rabbit) 


X62245 


CI_CGAP_Br2 H. sapiens cDNA clone (EST) (Similar to chick mit. phosphoenolpyru- 


AA587436 


87 


vate carboxykinase) 






pha-I-antiproteinase S 


M57270 


83 


i-formyltetrahydrofoJate dehydrogenase (rat) 


M59861 


83 


bosomal protein L6 (rat) 


X87107 


83 


ares pregnant uterus Nb (EST) (mouse) 


A A 156847 


83 


itochondrial citrate transport protein (human) 


L77567 


80 


toplasmic chaperonin hTRiC5 (human) 


UI7104 


80 


pha-l-antiproteinase F 


M57271 


77 


;terogene us nuclear ribonuclearprotein cl/c2 (human) 


D28382 


77 


arcs parathyroid tumour (EST) (similar to human serum albumin precursor) 


AA860651 


76 


atagene mouse kidney (EST) 


AA 107327 


75 


ares parathyr id tumour NbHPA human cDNA (EST) 


AA860653 


74 


ares mouse mammary gland (EST) 


AA6 19297 


74 


3NA cl ne 15 004 (EST) (human) 


H01826 


74 


ares senescent fibroblasts (EST) (mouse) 


W52190 


74 


:proalbumin (human) 


E04315 


72 


)NA clone 73 169 (EST) (human) 


T56624 


72 


amin D-binding protein (human) 


L10641 


71 


oH gene (ex n 8) (human) 


Y 11498 


71 


RL flow sorted chromosome 


B05457 


71 


ares foetal liver spleen (EST) (mouse) 


AA009524 


71 


ares foetal heart NbMH19W (EST) (mouse) 


AA009421 


69 


ares foetal heart NbHH19W H. sapiens cDNA clone (EST) 


W94377 


67 


enylalanine hydroxylase (human) 


U49897 


67 


}line-5-carboxylate dehydrogenase (human) 


U24266 


66 


jtathione-S-transferase horn logue (human) 


U90313 


65 


:i_CGAP_GCBI (EST) (human) 


AA769294 


65 


uective protein (human) 


M22960 


64 


>ne 27 375 (EST) (human) 


N37046 


62 


atagene colon ( # 937 204) H. sapiens cDNA clone (EST) 


AA 149777 


62 



Refers to the nucleotide sequence horn 1 gy between the cloned band isolated from the differential display and the correspond- 
gene derived from the EMBL gene sequence bank. 
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Tabic 5 

.Identification of genes that were downregulated in male guinea 
— pig- liver following-3-day treatment with WY-M.643 - 



FASTA-EMBL gene Accession No. Sequence 
- identification (guinea homology" (%) 

pig unless otherwise 
stated) 





1*1 JtV^T 


07 


pr ton 






Muiinogi bulin 




so 


A 1 MM All •% 

Aipna-i -an- 


pa j llll 


DO 
OO 


ti proteinase F 






Elongat) n factor-al- 




OA 

89 


pha-1 (rabbit) 






Coupling protein G 


X044UV 


88 


(human) 






NCI_CGAP_Ovl 


AA586309 


87 


(EST*) (human) 






Lecithinxh lesterol 


D13668 


85 


acetyl transferase 






(rabbit) 






Aid lase B (human) 


X00270 


84 


Anti-thrombin III 


E00116 


80 


(human) 






Phenylalanine hy- 


K03020 


80 


droxylase (human) 






Inter-cc-trypsin in- 


D38595 


79 


hibitor (human) 






N rmalised rat mus- 


AA849753 


78 


cle (EST) (S c ) 






Normalised rat ovary 


AA801059 


78 


(EST) (AS C ) 






Complement factor 




77 


Ba fragment (hu- 






man) 






UinydrodioJ dehydro- 


U0559o 


76 


genase (human) 






Spot 14 gene (thyroid- 


VAOilAO 


75 


inducible hepatic 






protein)(human) 






BAC clone I74pJ2 


AC004236 


75 


(human) 






Mitoch ndrial alde- 


X05409 


74 


hyde dehydr ge* 






nase (human) 






Preproalbumin (hu- 


E04315 


74 


man) 






NCI_CGAP_Pr9 


AA533142 


74 


(EST) (human) (S) 






Normalised rat pla- 


AA851197 


74 


centa (EST) (AS) 






Heparin sulfate pr - 


J04621 


73 


teoglycan (human) 






cDNA clone 33 992 


R24330 


73 



(EST) (human) 



23 

Table 5 (Continued) 

FASTA-EMfeL gene Accession No. Sequence 
identification (guinea homology* (%) 

pig unless otherwise 
stated) 



Retinol dehydrogenase U33501 71 
(rat) 

TAPA-1 integral mem- S45012 71 

brane protein 

(CD81) (mouse) 
Complement compo- M35525 70 

nent c5s 

Apolipoprotein B (pig) LI 1235 69 
cDNA clone 143 918 R76742 68 

(EST) (human) 
a-fibrinogen (human) K02569 68 
Soares foetal liver W03726 68 

spleen INF (mouse) 
Barstead bowel (EST) AA232049 67 

(mouse) 

UDP glucuronosyl AF0309137 66 

transferase (cat) 
Myeloid leukaemia cell L08246 65 

differentiation 

protein (MCL-1) 

(human) (S) 
STS SHGC-34 987 (hu-G27984 65 

man) (AS) 

Soares mouse AA222798 64 

3NME125 

Stratagene mouse em- AA 199420 64 

bryonic (EST) (S) 
Rad 52 (mouse) AF004854 63 



u Refers to the nucleotide sequence homology between the 
cloned band isolated from the differential display and the 
corresponding gene derived from the EMBL gene sequence 
bank. 

b EST is 'expressed sequence tag* — a gene of as yet 
unknown identity and function 

c Where sequence homologies were equal in both directions, 
boththe sense (S) and antisense (A) identities are given. 

for several reasons: (1) the kinetics of ligation and 
transformation favour the isolation of smaller 
PCR products, thereby producing a misrepresen- 
tation of larger gene products; (2) northern blot 
analysis is notoriously insensitive and is unlikely 
to confirm expression of rare transcripts; (3) there 
is no measurable end point to the screening of 
clones produced in this way other than to analyse 
every transformed colony. We used instead an 
alternative approach; after running out the differ- 
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ential display on a high-resolution agarose gel 
(Fig. 1) and overstating with SYBR Green I to 
enhance visualisation, the composite bands were 
individually extracted, reamplified and cloned. 
However, it has been well documented that single 
bands from differential displays often contain a 
heterogeneous mixture of different products 
(Mathieu-Daude et al., 1996; Smith et al., 1997). 
This is because polyacrylamide gels cannot dis- 
criminate between DNA sequences that differ in 
size by less than about 0.2% (Sambrook et al., 
1989). High-resolution agarose gels such as those 
used in this work are even less sensitive, normally 
:>nly discriminating products that differ in size by 
at least 1.5%. The use of the HA-red screening 
;tep enables resolution of identical or nearly iden- 
:ical sequences based on their AT content (Wawer 
;t al., 1995) and is sensitive down to < 1% differ- 
ence. Furthermore, it is rapid, technically simple 
ind does not require the use of radiolabels. 
jeisinger et al. (1997) originally demonstrated the 
lsefulness of using HA-red to identify different 
jroducts cloned from the same band of an RNA 
iifferential display experiment by simultaneously 
unning them in normal agarose (to discriminate 
>y size) and in normal agarose containing HA-red 
to discriminate by AT content). We have found 
tat this approach is equally useful for identifying 
ifferent gene species cloned from the same band 
f our SSH display. 

Diatchenko et al. (1996) reported that SSH is 
ighly efficient at producing differentially ex- 
ressed gene species. However, we also included a 
?cond screening step to further confirm that the 
lones isolated from the differential display were 
ideed differentially expressed. Duplicate dotblots 
f the candidate clones were blotted with the 
isplay from which they were originally isolated 
id with the 'reverse subtraction' display. To 
take the reverse-subtracted probe, the subtractive 
ybridisation step of the procedure was carried 
it using the original tester cDNA as a driver, 
id the original driver cDNA as a tester. In this 
ay, clones that are false positives can be iden- 
led through their presence in both blots. Such 
lse positives most commonly arise through hav- 
g a very high abundance in the initial sample or 
msual hybridisation properties (Li et al., 1994). 



Although the SSH method itself has been 
shown to be efficient, and despite the screening 
step that we included, there is an important caveat 
to bear in mind — namely that it is important 
that all clones be considered only as 'candidates* 
until the actual abundance of their mRNA is 
quantitated in treated and control samples. To- 
wards this end, we examined the expression of a 
limited number of clones using semi-quantitative 
RT-PCR. Albumin was used as the reference eene 
as we have previously found that the expression 
of this gene does not appear to chanee with the 
treatment regime that we used (Fig. 4, and data 
not shown). There are a number "of interesting 
points to note from our results. The first is the 
presence of genes that serve as appropriate posi- 
tive controls in the upregulated and downregu- 
lated series. For example, in the rat it can be seen 
that CYP4AI expression increases 14-fold follow- 
ing treatment. Although CYP4AI mRNA expres- 
sion levels following WY- 14,643 treatment have 
not been previously reported in this model, the 
figure compares favourably with that recorded by 
Bell et al. (1991), who used RNAse-protection to 
quantitate CYP4A1 in rat liver following treat- 
ment with methylclofenapate, another PP. In ad- 
dition, we also confirmed that the peroxisomal 
enoy]-CoA:hydratase-3-hydroxyacyl-CoA Afunc- 
tional enzyme is also upregulated 9-fold, in agree- 
ment with the findings of Chen and Crane (1992). 

A number of genes were downregulated follow- 
ing Wy-1 4,643 exposure, including CYP2C11 ex- 
pression. Corton et al. (1997) reported similar 
findings and suggested that this may in part ex- 
plain why male rats exposed to Wy-1 4,643 and 
some other PPs have high serum estradiol levels, 
as estradiol is a substrate for CYP2C1.1. We have 
also shown that the expression of contrapsin-like 
protease inhibitor (CLPI) was downregulated by 
Wy-1 4,643. This has not previously been reported, 
and we suggest that it may be linked to a require- 
ment for increased availability of amino acids to 
accommodate the hepatomegaly induced by treat- 
ment. Although little is known of the function of 
parathymosin-oc, (zinc 2 + -binding protein) it has 
been shown to interact with the globular domain 
of histone HI, suggesting a role in histone func- 
tion (Kondili et al., 1996). In contrast to the 
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25 



Albumin 
Transferrin 



Albumin 
Transferrin 




iownregulation observed in this work, other stud- 
as have shown that parathymosin-oc expression is 
levated in breast cancer (Tsitsilonis et al., 1993, 
998), with the implication that parathymosin-a 
iay somehow be involved in regulating eel] pro- 
feration by more than one mechanism. Transfer- 
in has previously been shown to be 
ownregulated in rat liver by hypolipidemic PPs 
Hertz et al., 1996). It is therefore interesting to 
ote that we isolated a clone identified as transfer- 
n from the upregulated display profile. Since we 
Dnfirmed by RT-PCR that transferrin is in fact 
ownregulated in the rat (Fig. 4), we conclude 
lat transferrin was either a false positive or was 
correctly identified. It could also be that we 
ive isolated a close relative, splice variant or 
^fonr of transferrin, which demonstrates a dif- 
rent expression profile under these experimental 
editions. Further investigations are therefore 



required to determine which of these possibilities 
are correct. 

One of our most intriguing observations was 
that one gene, CD81, appeared to be upregulated 
in rat liver but downregulated in guinea pig liver 
following Wy- 14,643 exposure. CD81 is a widely 
expressed cell surface protein that is involved in a 
large number of cellular functions, including ad- 
hesion, activation, proliferation and differentia- 
tion (reviewed by Levy et al., 1998). Since all of 
these functions are altered to some extent in car- 
cinogenesis, it is perhaps an important observa- 
tion that CD81 expression is differentially 
regulated in a resistant and sensitive species ex- 
posed to a non-genotoxic carcinogen. 

Albumin and ribosomal genes appear common 
to all differential displays and are thus undesir- 
able false positives. However, due to their high 
expression in the liver, they are difficult to re- 
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move. We also noted a number of gene species, 
particularly in the guinea pig, which were com- 
mon to both upregulated and downregulated 
profiles. Again, the most likely reason for these 
laving arisen is their high abundance. 

A relatively large number of upregulated and 
iownregulated genes were isolated from guinea 
)ig liver following Wy- 14,643 exposure. However, 
he guinea pig genome has been relatively poorly 
•haracterised and so many of the clones were 
dentified as resembling genes or ESTs from other 
pecies. Without full-length sequence data it is 
iifficult to ascertain the accuracy of the assigned 
ientities and this must be borne in mind when 
tilising data such as this, for example, in design- 
lg effective primers for RT-PCR studies. Al- 
lough the actual isolated clone sequences can be 
sed to do this, their relatively small size often 
^stricts the ability to design effective primers. In 
idition, as we observed with transferrin, using a 
ublished full-length sequence may help to iden- 
fy false positives. 

able 6 

:mi-quantitative RT-PCR analysis of selected gene species in the 



By comparing the expression profiles of genes 
showing altered expression in a PP-sensitive spe- 
cies (rat) with a PP-resistant species (guinea pig), 
it was our aim to identify genes that are mecha- 
nistically relevant to the non-genotoxic hepatocar- 
cinogenic action of Wy-14,643. However, few of 
the genes that we have isolated were common to 
both the rat and the guinea pig. This suggests 
either that the molecular mechanisms of response 
in these two species are so different that few genes 
are commonly regulated in response to Wy-14,643 
exposure, or that we have recovered only a small 
proportion of those genes that have altered ex- 
pression. The latter seems the more likely scenario 
since it is perceived that one of the main problems 
of subtractive hybridisation and other differential 
expression technologies is the inability to consis- 
tently isolate rare gene transcripts (Bertioli et al., 
1995). This is potentially problematic in that 
weakly expressed genes may play an important 
role in regulating key cellular processes, and that 
the majority of mRNA species are classified as 

rat and guinea pig* 



anscnpt 



Putative change of expression following 
treatment according to dotblot 



Change according to RT-PCR 
quantitation 



Rat 



Guinea pig Rat 



Guinea pig 



bumin 

functional enzyme 
fP2CH 

T4A1 
taiase 

>81 (TAPA-1) 



N/A 

Up 

Down 

Up 

N/A 

Up 



ntrapsin-Iike protease inhibitor Down 

-athymosin-a (zinc 3 * binding Down 
protein) 

msferrin Up 

>P-GJucuronosyl transferase Down 

wnUnknown-1 Down 

o2-gjycoprotein Up 



N/A 
N/A 
N/A 

N/A 

Up 

Down 

N/A 

N/A 

N/A 

N/A 

N/A 
N/A 



No change No change 

Upregulated* (9 x ) N/O 
Downregulated* N/D 
(Abolished) 

Upregulated* (14 x) N/D 
No change 
N/O 



N/O 

Upregulated**(1.4 



x ) 
N/D 



Downregulated** 

(0.5 x) 

Downregulated** 

(0.6 x) 

Downregulated* 

(0.5 x) " 

Downregulated** 

(0.2 x ) 

No change (P = 0.06) N/D 

No change N/O 



N/D 

No change 
N/O 



N/A. not applicable; N/O, not optimised; N/D, not done 

P< 0.0005; 

*?<0.05. 
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.'rare' in abundance (Bertioli et al M 1995). How- 
lever^ in their .original pager describing the SSH 
technique, Gurskaya et al. (1996) demonstrated 
that SSH can enrich rare molecules between 1000- 
• and 5000-fold in a single round of hybridisation. 
Unfortunately, due to high background smearing 
in our initial experiments (which hindered identifi- 
cation of single bands), we were compelled to 
reduce the primary hybridisation time to only 4 h 
— a step that theoretically is likely to reduce the 
number of rare sequences (CLONTECHniques, 
1996). Furthermore, it has been claimed by the 
manufacturers that, whilst this technique can 
identify changes as small as 1.5-fold between the 
driver and tester populations, it is best suited to 
the isolation of genes that show a greater than 
5-fold increase (CLONTECHniques, 1996). In ad- 
dition, where tester and driver contain genes with 
large and small differences in abundance, the SSH 
method will be biased towards identifying those 
genes with the large differences (CLONTECH- 
niques, 1996). Thus, it is most probable that we 
have not isolated all of the more rarely expressed 
transcripts and those demonstrating small changes 
in expression. 

One problem that remains is identifying the 
function of genes isolated in SSH experiments as 
described herein, some of which may be crucial to 
the process of carcinogenesis, and are, to date, 
unidentified. However, we have provided evidence 
herein that SSH can be used to begin the process 
of characterising the extent and importance of 
altered gene expression in response to a chemical 
stimulus. The developments of this approach 
should include characterisation of temporal and 
dose responses, and functional analysis studies 
including knockout mice. In combination, such 
studies should make a significant contribution to 
our understanding of the molecular mechanisms 
of action and physiological relevance of gene reg- 
ulation in non-genotoxic hepatocarcinogenesis. It 
should then be. possible to ascertain whether dif- 
ferentially expressed genes are causally or casually 
related to the chemical-induced toxicity, and 
therefore a substantial mechanistic advance. 

It is clear that there are also broader applica- 
tions for this experimental approach that go be- 
yond understanding the molecular mechanisms of 
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peroxisome-proliferator induced non-genotoxic 
hepatocarqinogenesis in rodents. The potential 
medical and therapeutic_henefits of elucidating the 
molecular changes that occur in any given cell in 
progressing from the normal to the carcinogenic 
(or other diseased, abnormal or developmental) 
state are very substantial. Notwithstanding the 
lack of complete functional identification of al- 
tered gene expression, such gene profiling studies 
described herein essentially provides a 'fingerprint' 
of each stage of carcinogenesis, and should help in 
the elucidation of specific and sensitive biomark- 
ers for different types of cancer. Amongst other 
benefits, such fingerprints and biomarkers could 
help uncover differences in histologically identical 
cancers, and provide diagnostic tests for the earli- 
est stages of neoplasia. In addition, the genes 
identified by this approach may be incorporated 
into gene-chip DNA-arrays, thus providing a 
standard genetic fingerprint for a particular toxin 
treatment in a particular species. Interrogation of 
these gene arrays for an unknown compound that 
has a similar pattern to the known reference 
chemical would then provide evidence that the 
unknown may have a toxicity profile similar to 
the 'standard' fingerprint, thereby serving as a 
mechanistically relevant platform for further de- 
tailed investigations. 
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ABSTRACT We have developed high-density DNA mi* 
croarrays of yeast ORFs. These microarTays can monitor 
hybridization to ORFs for applications such as quantitative 
differential gene expression analysis and screening for se- 
quence polymorphisms. Automated scripts retrieved sequence 
information from public databases to locate predicted ORFs 
and select appropriate primers for amplification. The primers 
were used to amplify yeast ORFs in 96-well plates, and the 
resulting products were arrayed using an automated micro 
arraying device. Arrays containing up to 2,479 yeast ORFs 
were printed on a single slide. The hybridization of fluores- 
centry labeled samples to the array were detected and quan- 
titated with a laser confocal scanning microscope. Applies* 
ti ns of the microarrays are shown for genetic and gene 
expression analysis at the whole genome level. 

The genome sequencing projects have generated and will con- 
tinue t generate enormous amounts of sequence data. The 
genomes of Saccharomyces cerevisiac, Haemophilus influenzae (1), 
Mycoplasma genisalium (2), and Meihanococcus jannischii (3) 
have been completely sequenced. Other mode] organisms have 
had substantial portions of their genomes sequenced as well 
including the nematode Caenorhabdws elegant (4) and the small 
flowering plant Arabidopsis thaUana (5). Given this ever- 
increasing amount of sequence information, new strategies are 
necessary to efficiently pursue the next phase of the genome 
projects-— the elucidation of gene expression patterns and gene 
product function on a whole genome scale. 

One important use of genome sequence data is to attempt 
t identify the functions of predicted ORFs within the genome. 
Many f the ORFs identified in the yeast genome sequence 
were not identified in decades of genetic studies and have no 
significant homology to previously identified sequences in the 
database. In addition, even in cases where ORFs have signif- 
icant horn logy to sequences in the database, or have known 
sequence motifs (e.g., protein kinase), this is not sufficient to 
determine the actual biological role of the gene product. 
Experimental analysis must be performed to thoroughly un- 
derstand the biological function of a given ORFs product. 
Model organisms, such as 5. cerevisiae, will be extremely 
important in improving our understanding of other more 
c mplex and less manipulable organisms. 

To examine in detaD the functional role of individual ORFs and 
relationships between genes at the expression level, this work 
describes the use of genome sequence information to study large 
numbers f genes efficiently and systematically. The procedure 
was as follows, (i) Software scripts scanned annotated sequence 
information from public databases f r predicted ORFs. (u) The 
start and stop position of each identified ORF was extracted 
aut maticalhy, aJ ng with the sequence data f the ORF and 200 
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bases flanking either side, (w) These data were used to automat- 
ically select PCR primers that would amplify the ORF. (rv) The 
primer sequences were automatically input into the automated 
multiplex oligonucleotide synthesizer (6). (v) The oligonucleo- 
tides were synthesized in 96-well format, and (w) used in 96-weO 
format to amplify the desired ORFs from a genomic DNA 
template, (vii) The products were arrayed using a high-density 
DNA arrayer (7-10). The gene arrays can be used for hybridiza- 
tion with a variety of labeled products such as cDNA for gene 
expression analysis or genomic DNA for strain comparisons, and 
genomic mismatch scanning purified DNA for genotyping (11). 

METHODS 

Script Design. All scripts were written in UNIX Tool Cbmmand 
Language. Annotated sequence information from GenBank was 
extracted into one file containing the complete nucleotide se- 
quence of a single chromosome. A second fDe contained the 
assigned ORF name followed by the start and stop positions of that 
ORF. The actual sequence contained within the specified range, 
along with 200 bases of sequence flanking both sides, was extracted 
and input into the primer selection program primer ols (White- 
head Institute, Boston). Primers were designed so as to allow 
amplification of entire ORFs. The selected primer sequences were 
read by the 96-well automated multiplex oligonucleotide synthe- 
sizer instrument for primer synthesis. The forward and reverse 
primers were synthesized in two separate 96-well plates in corre- 
sponding wells. All primers were synthesized on a 20-nmol scale. 

ORF Amplification and Purification. Genomic DNA was iso- 
lated as described (12) and used as template for the amplification 
reactions. Each PCR was done in a total volume of 100 ul A total 
of 0.2 uM each of forward and reverse primers were aliquoted into 
a 96-well PCR plate (Robbins Scientific, Sunnyvale, CA); a master 
mix containing 0.24 mM each dNTP, 10 mM Tris (pH 8.5), 50 mM 
MgCl 2 > 2.5 units Taq polymerase, and 10 ng of template was added 
to the primers, and the entire mix was thermal cycled for 30 cydes 
as follows: 15 min at 94°C, 15 min at 54°C, and 30 min at 72°C 
Products were ethanol precipitated in polystyrene v-bottom 96- 
well plates (Costar). All samples were dried and stored at -20°C 

Arraying Procedure and Processing. Microarrays were 
made as described (8). 

A custom built arraying robot was used to print batches of 48 
slides. The robot utilizes four printing tips which simultaneously 
pick up -1 ul of solution from 96-well microtiter plates. After 
printing, the microarrays were rehydrated for 30 sec in a humid 
chamber and then snap dried for 2 sec on a hot plate (lOO^C). The 
DNA was then UV crosslinked to the surface by subjecting the 
slides to 60 millijoules of energy. The rest of the pory-L-rysine 
surface was blocked by a 15 -min incubation in a solution of 70 mM 
succinic anhydride dissolved in a solution consisting of 315 ml of 
l-methyl-2-pyrroIidinone (Aldrich) and 35 ml of 1 M boric acid 
(pH 8.0). Directly after the blocking reaction, the bound DNA 
was denatured by a 2-min incubation in distilled water at — 95°C 

Abbreviation: YEP, yeast extract/peptone. 
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■ ^i. 7 Weolorfluoresccn,sca n of « yeast mieroanay contain- 
mg 2,479 elements (ORFs). The center-tc-center distance between 
elements is 345 pm. A probe mixture consisting of cDNA from yeast 
extract/peptone (YEP) galactose (green pseudocolor) and YEP glu- 
cose (red pseudocolor) grown yeast cultures was hybridized to the 
array. Intensity per element corresponds to ORF expression and 
pseudocolor per element corresponds to relative ORF egression 
between the two cultures. 

The slides were then transferred into a bath of 100% ethanol at 
room temperature. 

Probe Preparation: cDNA. Yeast cultures (100 ml) were grown 
to -1 ODamo and total RNA was isolated as described (13) Up 
t 500 fig total RNA was used to isolate mRNA (Qiaeen 
Chatsworth, CA). Oligo(dT)20 (5 M g) was added and annealed to 
2 n% f mRNA by heating the reaction to 70°C for 10 min and 
quick chilling on ice, plus 2 Superscript II (200 units/ul) (Life 
Technologies, Gaithersburg, MD), 0.6 pi 50x dNTP mix (final 
encentrauons were 500 /iM dATP, dCTP, dGTP, and 200 uM 
dTJT), 6 uJ 5x reaction buffer, and 60 plM Cy3-dUTP or 
C/5-dUTP (Amersham). Reactions were carried out at 42°C for 
2 n, after which the mRNA was degraded by the addition of 0 3 
nl 5 M NaOH and 03 mJ 1 00 mM EDTA and heating to 65°C for 
10 mm. The sample was then diluted to 500 with TE and 
concentrated using a Microcon-30 (Arnicon) to 10 fil. 

Probe Preparation: Genomic DNA. Fluorescent DNA was 
prepared from total genomic DNA as follows: 1 M g of random 
nonamer ligonucleotides was added to M g of genomic 
DNA. This mixture was boiled for 2 min and then chilled on 
ice. A reaction mixture containing dNTPs (25 uM dATP 
dCTP, dGTP. 10 mM dTTP, and 40 Cy3-dUTP or 
Cy5-dUTP) reaction buffer (New England Biolabs), and 20 
units ex nuclease free Klenow enzyme (United States Bio- 
chemical) was added, and the reaction was incubated at 37°C 
for 2 h. The sample was then diluted to 500 jtl with TE and 
concentrated using a Microcon-30 (Amicon) to 10 /J. 

Hybridization. Purified, labeled probe was resuspended in 11 
pJi of 3 5 x SSC containing 10 ng Escherichia coU tRNA, and 0 3% 
SDS. The sample was then heated for 2 min in boiling water 
cooled rapidly t room temperature, and applied to the array The' 
array was placed in a sealed, humidified, hybridization chamber 
Hybridization was carried out f r 10 h in a 62°C water bath, after 
which the arrays were washed immediately in 2x SSC/0 2% SDS 
A second wash was performed in 0.1 x SSC 

Analysis and Quantitation. Arrays were scanned n a 
scanning laser fhi rescence microsc pe developed by Steve 
Smith with s ftware written by N am Ziv (Stanf rd Univer- 
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sity). A separate scan was donefbr each of the two fluoro- 
phores used. The images were then combined for analysis. A 
bounding box, fined to the size f the DNA spots, was placed 
over each array element. The average fluorescent intensity was 
calculated by summing the intensities of each pixel present in 
a bounding box and then dividing by the total number of pixels. 
Local area background was calculated for each array element 
by determining the average fluorescent intensity at the edge of 
the boundmg box. To normalize for fiuorophore-specific vari- 
ation control spots containing yeast genomic DNA were 
applied to each quadrant during the arraving process. These 
elements were quantitated and the ratios'of the signals were 
determined These ratios were then used to normalize the 
pnotomultipher sensitivity settings such that the ratios of the 
i' U ° r f^? ce of tbe gnomic DNA spots were close to a value 
of 1.0. pe average signal intensity at any given spot was 
regarded as significant if it was at least two standard deviations 
above background. Each experiment was conducted in dupli- 
cate, with the fluorophores representing each channel re- 
versed. The ratios presented here are the average of the two 
experiments, except in the case in which the signal for the 
element in question was below the reliability threshold. The 
reliability threshold also determined the dynamic range of tbe 
experiment For all of the experiments presented, the average 
dynamic range was -1 to 100. In the case where the fluores- 
cence from a very bright spot saturates the detector, differ- 
ential ratios will, m general, be underestimated. This can be 
compensated for by scanning at a lower overall sensitivity. 

RESULTS 

The accumulation of sequence information from model organ- 
isms presents an enormous opportunity and challenge to under- 
stand the biological function of many previously uncharacterized 
genes. To do this accurately and efficiently, a directed strategy 
was developed that enables the monitoring of multiple genes 
ITh nM°A y ' ^' Cr ° an ? ym 8 technology provides a method by 

2£ , ™ t 03 " a " ached ,0 3 glass ***** m a high-density 
format (8) In practice, it is possible to array over 6,000 elemenu 

of -6 100 ORFs the entire set of yeast genes can be spotted onto 
a single glass slide. 

With this capability and the availability of the entire se- 
quence of the yeast genome, our strategy was to use a directed 
approach for generating the complete genome array. This 
procedure invo ved synthesizing a pair of oligonucleotide 
primers to amplify each ORF. The PCR product containing 
each gene of interest was arrayed onto glass and used for 
example, as probe for monitoring gene expression levels by 
hybridizing to the array labeled cDNA generated from isolated 
mRNA of a culture grown under any experimental condition 

Primer Selection and Synthesis. The primer selection was fully 
automated using Tool Command Language scripts and primes 
(Whuehead). Pnmer pairs were automatically selected suc- 
cessfully for >99% of the ORFs tested. Primer sequences can thus 
be selected rapidly with minimal manual processing. A complete 
set of forward and reverse primers were selected initially for each 
ORF on chromosomes I, II, III, V. VI, VIII, IX, X, and XI 
Primers for a representative set of ORFs (15% coverage) were 
chosen for the remaining chromosomes. With the release of the 
entire yeast genome sequence, the complete set of primers has 
now been selected. 

=> ° RF requires 8 mk P* P air of sy^'tic Primers, 

a total of approximately 12,200 oligonucleotides will be required 
to indrvidually amplify each target. This costly component was 
addrescd with the automated multiplex oligonucleotide synthe- 
sizer (6) which efficiently synthesizes primers in a 96-well format. 
Each pnmer synthesized on a 20-nm I scale, provides enough 
material for 100 amplification reactions, whereas a given PCR 
product provides enough material to generate an element on 
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ORF 

YLR142 
YOL140 
YGL148 
YFL014 
YBR072 
YBRQ54 
YCR021 
YER103 
YLR259 
YBR169 
YBL075 
YPL240 
YDR258 
YNL007 
YEL030 
YHR064 
YBL008 
YBL002 
YBL0Q3 
YBR010 
YBR009 
YDR343 
YHR092 
YAR071 
YLR096 
YER102 
YBR181 
YCR031 
YLR441 
YHR141 
YBL072 
YHL015 
YBR191 
YLR340 
YGL123 
YLR194 



Gene 

PUT1 
ARG8 
AR02 
HSP12 
HSP26 
YR02 
HSP30 
SSA4 
HSP60 
SSE2 
SSA3 
HSP82 
HSP78 
SIS1 



HIR1 
HTB2 
HTA2 
HHT1 
HHF1 
HXT6 
HXT4 
PHOll 
KIN2 
RPS8B 
RPS101 
CRY1 
RP10A 
RPU1B 
RPS8A 
URP2 
URP1 
RPLAO 
SUP44 
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Description 

Proline oxidase 
Acetylornithine aminotransferase 
Chorismate synthase 
Heat shock protein 
Heat shock protein 

Similarity to HSP30 heat shock protein Yrolp 
Heat shock protein 
Heat shock protein 

Mitochondrial heat shock protein HSP60 
Heat shock protein of the HSP70 family 
Cytoplasmic heat shock protein 
Heat shock protein 

°' <** - ATTHkpB*. p roteM e, 

70-kDa heat shock protein 

Heat shock protein 

Histone transcription regulator 

Histonc H2B.2 

Histone H2A^ 

Histone H3 

Histone H4 

High-affinity hexose transporter 

Moderate- to low-affinity glucose transporter 

Secreted acid phosphatase, 56 kDa isozyme 

Scr/Thr protein kinase 

RibosomaJ protein S8.e 

RibosomaJ protein S6.e 

40S ribosomal protein S14.e 

RibosomaJ protein S3.a.e 

RibosomaJ protein L36a.e 
Ribosomal protein S8.e 
Ribosomal protein 
Ribosomal protein L21.e 
Acidic RibosomaJ protein LlO.e 
RibosomaJ protein 
Hypothetical protein 



500-1,000 arrays. Thus, a single primer pair provides enoueh 
starting material for up to -50,000 arrays. 

Primers were synthesized to amplify yeast ORFs Primer 
synthesis had a failure rate of <1% in over 18 plates of 
synthesis as determined by standard trityj analysis (6) The 

* UCC S?« !f ° f PCR am P ,if,cati °ns «ing the primer pairs 
was 94% based on agarose gel analysis of each PCR. The 
purified PCR products were used to generate arrays. Two 
versi ns of the arrays were created for the experimental results 
presented here. The first array contained 2,287 elements and 
the second array batch contained 2,479 elements. 

Gen me Arrays. The amplified ORFs were arrayed onto glass 
at a spaong f 345 microns (Eg. 1). The high-density spacing of 
DNA samples allows the hybridization volumes to be mini- 
maed— volumes are a maximum of 10 ^U. The labeled probe can 
thus be maintained at relatively high concentrations, making 1-2 
MS of mRNA sufficient for analysis. This also obviates the need 
f r a subsequent amplification step and thus avoids the risk of 
altering the relative ratios of different cDNA species in the 
sample. 

Genetic Analysis: Genomic Comparison of Unrelated Strains. 
Microarrays allow efficient c mparison of the genomes of dif. 
ferent strains. Genomic DNA from Y55, an 5. cerevisiae strain 
divergent from the reference strain S288c, was randomly labeled 
with Cy3-dUTP and hybridized simultaneously with the S288c 
DNA labeled with Cy5-dUTP. When a comparison between the 
hybridization of the DNA from the two strains was done several 



elements gave relatively little or no signal above background from 
1 ZJ?* 3 channel (data not shown). These include SGPl 
ASE3A-D. YLR156, YLR159, YLR161, S^cmmi 

Tn^rTi YCR1 ° 5 - ^ ™ ]ts ™& that ( u? rSon^ 
containing these genes are extremely divergent, or all together 
deleted from the strain. Subsequent attempts to generate PCR 
products rom SGEl, ENA2, and ASP3A using Y55 DNAfaited 
TT»s result supports the conclusion that these genes are likely to 

1 C rCg,0nS ab5Cnt m ** Y55 8 cnomc have been 
Mfr 1 ^ to be deleted in mutant laboratory 
S? (14 ~ 16 >- In Particular, the Asp-3 region appears to be 
highly prone to being deleted (15, 16). 
These results indicate that gene arrays can be used to efficiently 

JS^?"- T^i™ organism for ,ar « c dcIcli °" Pory- 
morphisms. A single hybridization and scan will reveal differences 

™£i f erenlW h >: bridi2alion l ° particular elements. It is 
reasonable to suppose that an equivalent number of genes are 
present in the Y55 genome and absent in the S288c genome This 
result should be viewed as a minimum estimate of thX2 
jxjlymorphisrns that exist between these two unrelated strains as 
mtergenic deletions or small intragenic deletions would not be 
detected because considerable hybridizing material would be 
remain Sequence polymorphisms, such as deletions, are present 
m ^pulanons of every species and must at some level affect 
°°? ° f cha,ta * e « of *c gen me era will be to 
S™ m ! s ?* uencc P° lvm rphisms that exist in the 
natural gene pool relative t the reference gen me sequence. 
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Heat Shock 
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CdUaf! 



Clurtiv mrtjholiiin 



ClPbindinj; 



Histidine tyune Imcinr 



crffNA Nurlrjr _ . 

Mitorfmnd/iat nuinlnuivp cnrUbdisn porr Onr carbon Vunn* cymhrta 



ft^ilr ftuKjihatrnirlabalisi 



Prtrtnn Kinase 



fSHnndinp 



DNAivpair 



Ribosomal protein 



SponAilion Srpl/Tipl Swt/Sflf «EF ' rtf 



Srcrrtory Crnrnd Tareaffitlixi f Jttnrt 



Alcohol 



WNA synthetase 
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emailing pnne^c whw AT^je VUgnb pmtctn Hcfl dmrl pnrtrtns 



Gene Expression Analysis. The arrays were used to examine 
gene expression in yeast grown under a variety of different 
conditions. Expression analysis is an ideal application of these 
arrays because a single hybridization provides quantitative expres- 

Table 2. Cold shock vs. control expression data 

Ratio of 
gene expression 



Fic. 2. ORF categories displaying dif- 
ferential expression between heat shocked 
and untreated cultures. Bars within cate- 
gories correspond to individual ORFs. 
Green shaded bars correspond to relative 
increases in ORF expression under 25*C 
growth conditions. Red shaded bars cor- 
respond to relative increases in ORF ex- 
pression under 39°C growth conditions. 

sion data for thousands of genes. To better understand results for 
genes of known function, ORFs were placed in biologically rele- 
vant categories on the basis of function (e.g., amino acid catabobc 
genes) and/or pathways (e.g, the histidine biosynthesis pathway) 



Control Cbld 


ORF 


Gene 


" i3 


YOR153 


PDR5 


2.4 


YCR012 


PGK1 


2.9 


YCL040 


GLK1 


1.4 


YHR064 




2.0 


YJL034 


KAR2 


2.1 


YDR258 


HSP78 


12 


YLL039 


UB14 


2.7 


YLL026 


HSP104 


3.1 


YER103 


SSA4 


33 


YBR126 


TPS1 


3.8 


YPL240 


HSP82 


7.9 


YBR054 


YR02 


7.9 


YBR072 


HSP26 




YCR021 


HSP30 


1.8 


YDR343 


HXT6 


2.1 


YHR096 


HXT5 


2.4 


YFR053 


HXK1 


2.8 


YHR092 


HXT4 


3.4 


YHR094 


HXT1 


23 


YHR089 


GAR1 


1.7 


YLR048 


NAB1B 


1.7 


YLR441 


RP10A 


1.7 


YLL045 


RPL4B 


1.6 


YLR029 


RPL13A 


1.6 


YGL123 


SUP44 


3.1 


YBR067 


TIP1 


12 


YER011 


T1R1 


2.0 


YCR058 




4.2 


YKL102 





Description 

Plciotropic drug resistance protein 
Phosphoglycerate kinase 
Aldohexose specific glucokinase 
Heat shock protein 
Nuclear fusion protein 

Mitochondrial heat shock protein of clpb family of ATP-dependem proteases 
Ubiqumn precursor y 
Heat shock protein 
Heat shock protein 

a, o-Trchalosc-phosphatc synthase (UDP-forming) 
Heat shock protein 

Similarity to HSP30 heat shock protein Yrolp 

Heat shock protein 

Heat shock protein 

High -affinity hexose transporter 

Putative hexose transporter 

Hexokinasc I 

Moderate- to low-affinity glucose transporter 
Low-affinity hexose (glucose) transporter 
Nucleolar rRNA processing protein 
40S ribosomal protein p40 homolog b 
Ribosomal protein S3a.c 
Ribosomal protein L7a.e.B 
Ribosomal protein L15.e 
Ribosomal protein 

Cold- and heat-shock-induced protein of the Srpl/Tiplp family 
Cold-shock-induced protein of the Tirlp, Tiplp family 
Hypothetical protein 
Hypothetical protein 
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Table 3. Glucose vs. galactose expression data 
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Ratio of 
gene expression 



Glucose 


Galactose 


ABC 


Gene 


2.1 




YHR018 


ARG4 


3.5 




YPR035 


GLN1 


Z8 




YMU16 


ATR1 


10 




YMR303 


ADH2 


3.7 




YBR145 


ADH5 




3.2 


YBL030 


AAC2 




19 


YBR085 


AAC3 




17 


YDR298 


ATP5 




15 


YBR039 


ATP3 




5-5 


YML054 


CYB2 




3.4 


YMLQ54 


CYB2 




13 


YKL150 


MCR1 




A2 


YBL045 


COR1 




33 


YDL067 


COX9 




17 


YLR038 


COX12 




16 


YHR051 


COX6 




14 


YLR395 


COX8 




13 


YFR033 


OCR6 




23.7 


YLR081 


GAL2 




21.9 


YBR018 


GAL7 




21.8 


YBR020 


GAL1 




19.5 


YBR019 


GAL10 




14.7 


YLR081 


GAL2 




8.6 


YDR009 


GAL3 




3.0 


YML051 


GAL80(1) 




18 


YML051 


GAL80(2) 


2.7 




YER055 


HIS1 


3.4 




YBR248 


HIS7 


7.4 




YCL030 


HIS4 


5.8 




YKR080 


MTD1 


6.0 




YDR019 


GCV1 


6.1 




YLRQ58 


SHM2 




8.1 


YML123 


PH084 


3.5 




YDR408 


ADE8 


3.6 




YDR408 


ADE8 


4.4 




YAR015 


ADE1 


5.6 




YMR300 


ADE4 


5.6 




YOR128 


ADE2 


6.0 




YGL234 


ADE5.7 




63 


YBL015 


ACH1 



Description 



Arginosucdnate lyase 
Glutajnate-ammonia ligase 

Aminotriazole and 4-nitroquinolinc resistance protein 
Alcohol dehydrogenase II 
Alcohol dehydrogenase V 
ADP, ATP carrier protein 2 
ADP, ATP carrier protein 
H* -transporting ATP synthase 6 chain precursor 
-transporting ATP synthase 7 chain precursor 
Lactate dehydrogenase cytochrome b2 
Lactate dehydrogenase cytochrome b2 
Cytochrome-to reductase 

Ubiquinol-cytochrome c reduaase 44K core protein 

Cytochrome c oxidase chain VIIA 

Cytochrome c oxidase, subunit VIB 

Cytochrome c oxidase subunit VI 

Cytochrome c oxidase chain VIII 

UbiquinoJ-cytochrome c reductase 17K protein 

Galactose (and glucose) permease 

UDP-glucose-hexose-l -phosphate uridyrykransferase 

Galactokinase 

UDP-glucose 4-epimerase 

Galactose (and glucose) permease 

Galactokinase 

Negative regulator for expression of galactose-induced genes 
Negative regulator for expression of galactose-induced genes 
ATP phosphoribosyltransferase 
Glutamine amidotransferase/cyclase 

Phosphoribosyl-AMP cyclohydrolasc/phosphoribosyl-ATP pyrophosphatasc/histidinol 

dehydrogenase 
Methylenetetrahydrofolate dehydrogenase (NAD+) 
Glycine decarboxylase T subunit 
Serine hydroxymethyltransferase 
High-affinity inorganic phosphate/H* symporter 
Phosphoribosylglycinamide formyltransferase (GART) 
Phosphoribosylglyctnamidc formyltransferase (GART) 
Phosphoribosylamidoimidazolc-succinocarboxamidc synthase 
Amidophosphoribosyltransferase 
Phosphoribosytaminoimidazole carboxylase 

Phosphoribosylamine-glycine ligase and phosphoribosylformylglycinamidine cydo-linase 
Acetyl-CoA hydrolase 



Heat Shock Results. A log phase culture growing in YEP/ 
dextrose medium at 25 ft C was split in half. One half of the 
culture remained at 25°C whereas the other half of the culture 
was shifted to 3°°C mRNA was isolated from both cultures 1 h 
after heat shock for comparison on microarrays and, although 
this time point is not optimal for measuring induction of heat 
shock mRNAs (17), many known heat shock genes exhibited 
c nsiderable induction at this time point (Table 1; Fig. 2). 
Down-regulation of genes in the ribosomal protein and histone 
gene categ ries was also observed. Differential expression 
between the heat-shocked culture and the control was also 
observed for many other genes. Genes in many categories, such 
as amino acid catabolism and amino acid synthesis, exhibited 
a mixed response with some genes showing little or no 
differential expression and other genes sh wing a significant 
increase or decrease in gene expressi n in response to heat 
shock (Table 1; Fig. 2). 

Cold Shock Results. A 1 g phase culture gr wing in YEP/ 
dextrose medium at 37°C was split in half. One half of the 
culture remained at 37°C while the other half of the culture was 
shifted to 18°C mRNA was isolated from both cultures 1 h 
after c Id shock f r c mparis n on microarrays. As expected, 



two known cold shock genes (TIP1, TIR1) were expressed at 
a significantly higher level in the cold-shocked culture. Genes 
in other functional categories, such as glucose metabolism and 
heat shock displayed a mixed response with expression of some 
genes being unaffected and other genes exhibiting significant 
up- or down-regulation in response to cold shock (Table 2). 

Steady-State Galactose vs. Glucose Results. mRNA was 
isolated from steady-state log phase YEP galactose and YEP 
glucose grown cultures for comparison on the microarrays. As 
expected, the GAL genes were expressed at a much higher 
level in the galactose culture. Many genes were differentially 
expressed in these cultures that were not a priori expected to 
exhibit differential expression. For example, some genes in the 
amino acid catabolic category were up-regulated in the galac- 
tose culture whereas genes in the one-carbon metabolism and 
purine categories were largely or entirely down-regulated in 
the galactose culture (Table 3). Genes in ther categories, such 
as amino acid synthesis, abc transporter, cytochr me c, and 
cytochrome o, exhibited mixed responses; some genes in a 
category showed little r no obvi us differential expressi n 
whereas other genes in the same category sh wed significant 
differential expression in the galactose and glucose cultures. 
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DISCUSSION 
The results of these experiments show that many genes are 
differentially expressed under the mice environmental condi- 
tions described here. The expected and predicted changes in gene 
expression, such as HSP12 in the heat-shocked culture, TJP1 in 
thecold-shccked culture, and GAL2 in the steady-state galactose 
culture, were observed in every case. However, in addition to the 
expected changes in gene expression, significant differential 
expression was also observed for many other genes that would 
not, a priori, be expected to be differentially expressed For 
example, expression of PHOll decreased and expression of 
YLR194, KTN2, and HXT6 increased in the heat shocked culture 
Expression of MST1 and APE3 decreased and expression of 
PDR5 and GAR1 increased m the cold-shocked culture. In 
addition, ADE4 and SER2 were expressed at reduced levels 
whereas PH084 and ACH1 were expressed at higher levels in 
cells grown in galactose compared with cells grown in glucose 
Differential expression of these and many other genes was specific 
to one of these three environmental conditions. 

Many other genes were found to be differentially expressed 
under more than one condition. When differentially expressed 
genes m cold- and heat-shocked cultures were compared, 30 
genes were found in common. Of these 30 genes, 28 showed 
mverse expression (Le, increased expression under one condition 
VcSnSEPSFl ™ d « *e other condition). Two genes, 
Y OR058 and YKL1 02, showed elevated expression in response to 
toj.ee Id and heat shoct fifteen genes were found to £ 
differentially expressed m both the heat-shocked and steady-state 
galactose : cultures 9 genes showed increased expression and 5 
showed decreased expression under both conditions. Twenty 
genes were differentially expressed in both the cold-shocked and 
steady-state galactose cultures: 8 genes showed decreased expres- 
sion and 5 genes showed increased expression under both con- 
diuons. Six genes showed increased expression in the galactose 
culture and decreased expression in the cold shocked culture. 
One gene (ODP1) showed increased expression in both the 
cold-shocked and steady-state galactose cultures. 

Gene expression is affected in a global fashion when environ- 
mental conditions are changed and both expected and unex- 
pected genes are affected. There is also overlap in the genes that 
are different^ expressed under quite different environmental 
conditions. These results can be rationalized by considering the 
high degree of cross-pathway regulation in yeast. For example, 
there is evidence for cross-pathway regulation between (/) carbon 
and nitrogen metabolism (18), (ii) phosphate and sulfate metab- 
olism £9) and («) purine, phosphate, and amino acid metabo- 
lism (20-24). There are also examples of the interaction of 
SfE? ^ Sptt f C t P nsai P^ factors (25, 26). Finally, within 
the broad class of amino add biosynthetic genes, there is evidence 
I r amino acid specific regulation of some genes, regulation via 
general control for other genes, and regulation via both specific 
and general control for other genes (22, 27-30) 

Cross-pathway regulation arises from the complex structure 
ot pr moters. Virtually all promoters contain sites for multiple 
transenpti n factors and, therefore, virtually all genes are 
subject t c mbinatorial regulation. For example, the HIS4 
promoter contains binding sites for GCN4 (the general amino 
acid c mrol transcription factor), PH02/BAS2 (a transcrip- 
tional regulator of phosphatase and purine biosynthetic 
genes), and BASl (a transcriptional regulator of purine bio- 
synthetic genes) (31). It is likely that the complex effects on 
gene expressi n described in this w rk are a direct conse- 
quence f the combinatorial regulati n of gene expression. 

These findings illustrate the power f the highly parallel whole 
genome approach when examining gene expression. The global 
effects of environmental change on gene expression can now be 
directy visualized. It is dear that determining the mechanism(s) 
and the functional role of the dramatic global effects on gene 
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oq^esaon in different environmeia wffl be . significant chat- 
tajge. The era of whole genome analysis will, utan^aBow 
researchers to switch from the very focused single KncfrZJ^Z 

complex network of gene regulatory pathways. ^ 
With the entire sequence of this model organism known, new 

anajyses (32, 33) of gene function. The genome mkroarravs 

yeast genome. This pilot study uses arrays containing >35% of 

toSSf&Z * ° f **** ^ 'o be used 
The genome arrays provide for a robust, fully automated 
approach toward examining genome structure and gene toe- 
uon. They allow for comparisons between diff«*m «!. 

This research wffl help to elucidate relationships 

genes and a low the researcher to understand gene fancSoTS 

understanding expression patterns across the yeas^genome 

W/HSriSr ProVided '* Na,i0naJ lnStitut « °' Health Gran, 
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Exploring the Metabolic and Genetic Control of 
Gene Expression on a Genomic Scale 

Joseph L DeRisi, Vishwanath R. Iyer, Patrick O. Brown* 

DMA microarrays containing virtually every gene of Saccharomyces cerevisiae were used 
to carry ut a comprehensive investigation of the temporal program of gene expression 
accompanying the metabolic shift from fermentation to respiration. The expressioS 
profil s observed for genes with known metabolic functions pointed to features of the 
metabolic reprogramming that occur during the diauxic shift, and the expression patterns 
of many previously uncharacterized genes provided clues to their possible functions The 
same DNA microarrays were also used to identify genes whose expression was affected 
Si! 7 v ^" s 5"P t,onal ^repressor TUP1 or overexpression of the transcrip- 
tional activator YAP1 These results demonstrate the feasibility and utility of this a£ 
proach to genomewide exploration of gene expression patterns 



The complete sequences of nearly a dozen 
microbial genomes are known, and in the 
next several years we expect to know the 
complete genome sequences of several 
metazoans, including the human genome. 
Defining the role of each gene in these 
genomes will be a formidable task, and un- 
derstanding how the genome functions as a 
whole in the complex natural history of a 
living organism presents an even greater 
challenge. 

Knowing when and where a gene is 
expressed often provides a strong clue as to 
its biological role. Conversely, the pattern 
of genes expressed in a cell can provide 
detailed information about its state. Al- 
though regulation of protein abundance in 
a cell is by no means accomplished solely 
by regulation of mRNA, virtually all dif- 
ferences in cell type or state are correlated 
with changes in the mRNA levels of many 
genes. This is fortuitous because the only 
specific reagent required to measure the 
abundance of the mRNA for a specific 
gene is a cDNA sequence. DNA microar- 
rays, consisting of thousands of individual 
gene sequences printed in a high-density 
array on a glass microscope slide (J, 2), 
provide a practical and economical tool 
for studying gene expression on a very 
large scale (3-6). 

Saccharomyces cerevisiae is an especially 

Department of Bochemistry, Stanford University School 
of Medicine. Howard Hughes Medcaf Institute. Stanford 
CA 94305-5428. USA 

* To w hom co rrespondence should be addressed. E-mafc 
porcwnOcmgrnjtarttord.edu 



favorable organism in which to conduct a 
systematic investigation of gene expression. 
The genes are easy to recognize in the ge- 
nome sequence, ris regulatory elements are 
generally compact and close to the tran- 
scription units, much is already known 
about its genetic regulatory mechanisms, 
and a powerful set of tools is available for its 
analysis. 

A recurring cycle in the natural history 
of yeast involves a shift from anaerobic 
(fermentation) to aerobic (respiration) me- 
tabolism. Inoculation of yeast into a medi- 
um rich in sugar is followed by rapid growth 
fueled by fermentation, with the production 
of ethanol. When the fermentable sugar is 
exhausted, the yeast cells turn to ethanol as 
a carbon source for aerobic growth. This 
switch from anaerobic growth to aerobic 
respiration upon depletion of glucose, re- 
ferred to as the diauxic shift, is correlated 
with widespread changes in the expression 
of genes involved in fundamental cellular 
processes such as carbon metabolism, pro- 
tein synthesis, and carbohydrate storage 
(7). We used DNA microarrays to charac- 
terize the changes in gene expression that 
take place during this process for nearly the 
entire genome, and to investigate the ge- 
netic circuitry that regulates and executes 
this program. 

Yeast open reading frames (ORFs) were 
amplified by the polymerase chain reaction 
(PCR), with a commercially available set of 
primer pairs (8). DNA microarrays, con- 
taining approximately 6400 distinct DNA 
sequences, were printed onto glass slides by 
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using a simple robotic printing device (9). 
Cells from an exponentially growing culture 
of yeast were inoculated into fresh medium 
and grown at 30°C for 21 hours. After an 
initial 9 hours of growth, samples were har- 
vested at seven successive 2-hour intervals, 
and mRNA was isolated (JO). Fluorescently 
labeled cDN A was prepared by reverse tran- 
scription in the presence of Cy3(green)- 
or Cy5(red)-labeled deoxyuridine triphos- 
phate (dUTP) (IJ) and then hybridized to 
the microarrays (12). To maximize the re- 
hability with which changes in expression 
levels could be discerned, we labeled cDNA 
prepared from cells at each successive time 
point with Cy5, then mixed it with a Cy3- 
labeled "reference" cDNA sample prepared 
from cells harvested at the first interval 
after inoculation. In this experimental de- 
sign, the relative fluorescence intensity 
measured for the Cy3 and Cy5 fluors at 
each array element provides a reliable mea- 
sure of the relative abundance of the corre- 
sponding mRNA in the two cell popula- 
tions (Fig. 1). Data from the series of seven 
samples (Fig. 2), consisting of more than 
43,000 expression-ratio measurements, 
were organized into a database to facilitate 
efficient exploration and analysis of the 
results. This database is publicly available 
on the Internet (13). 

During exponential growth in glucose- 
rich medium, the global pattern of gene 
expression was remarkably stable. Indeed, 
when gene expression patterns between the 
first two cell samples (harvested at a 2-hour 
interval) were compared, mRNA levels dif- 
fered by a factor of 2 or more for only 19 
genes (0.3%), and the largest of these dif- 
ferences was only 2.7-fold (14). However, as 
glucose was progressively depleted from the 
growth media during the course of the ex- 
periment, a marked change was seen in the 
global pattern of gene expression. mRNA 
levels for approximately 710 genes were 
induced by a factor of at least 2, and the 
mRNA levels for approximately 1030 genes 
declined by a factor of at least 2. Messenger 
RNA levels for 183 genes increased by a 
factor of at least 4, and mRNA levels f r 
203 genes diminished by a factor of at least 
4. About half of these differentially ex- 
pressed genes have no currently recognized 
function and are not yet named. Indeed, 
more than 400 of the differentially ex- 
pressed genes have no apparent homology 
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lb any gene whose function is known ( 15). 
The responses of these previously unchar- 
acterized genes to the diauxic shift therefore 
provides the first small clue to their possible 
roles. 

The global view of changes in expres- 
sion of genes with known functions pro- 
vides a vivid picture of the way in which 
the cell adapts to a changing environ- 
ment. Figure 3 shows a portion of the yeast 
metabolic pathways involved in carbon 
and energy metabolism. Mapping the 
changes we observed in the mRNAs en- 
coding each enzyme onto this framework 
allowed us to infer the redirection in the 
flow of metabolites through this system. 
We observed large inductions of the genes 
coding for the enzymes aldehyde dehydro- 
genase (ALD2) and acetyl-coenzyme 
A(CoA) synthase (ACS/), which func- 
tion together to convert the products of 
alcohol dehydrogenase into acetyl-CoA, 
which in turn is used to fuel the tricarbox- 
ylic acid (TCA) cycle and the glyoxylate 
cycle. The concomitant shutdown of tran- 
scription of the genes encoding pyruvate 
decarboxylase and induction of pyruvate 
carboxylase rechannels pyruvate away 
from acetaldehyde, and instead to oxalac- 
etate, where it can serve to supply the 
TCA cycle and gluconeogenesis. Induc- 
tion of the pivotal genes PCK1, encoding 
phosphoenolpyruvate carboxykinase, and 
FBP1, encoding fructose 1,6-biphos- 
phatase, switches the directions of two key 
irreversible steps in glycolysis, reversing 
the flow of metabolites along the revers- 
ible steps of the glycolytic pathway toward 
the essential biosynthetic precursor, glu- 
coses-phosphate. Induction of the genes 
coding for the trehalose synthase and gly- 
cogen synthase complexes promotes chan- 
neling of glucose-6-phosphate into these 
carbohydrate storage pathways. 

Just as the changes in expression of 
genes encoding pivotal enzymes can pro- 
vide insight into metabolic reprogram- 
ming, the behavior of large groups of func- 
tionally related genes can provide a broad 
view of the systematic way in which the 
yeast cell adapts to a changing environ- 
ment (Fig. 4). Several classes of genes, 
such as cytochrome c-related genes and 
those involved in the TCA/glyoxylate cy- 
cle and carbohydrate storage, were coord i- 
nately induced by glucose exhaustion. In 
contrast, genes devoted to protein synthe- 
sis, including ribosomal proteins, tRNA 
synthetases, and translation, elongation, 
and initiation factors, exhibited a coordi- 
nated decrease in expression. More than 
95% of ribosomal genes showed at least 
twofold decreases in expression during the 
diauxic shift (Fig. 4) (13), A noteworthy 
and illuminating exception was that the 



genes encoding mitochondrial ribosomal 
genes were generally induced rather than 
repressed after glucose limitation, high- 
lighting the requirement for mitchondrial 
biogenesis (13). As more is learned about 
the functions of every gene in the yeast 
genome, the ability to gain insight into a 
cell's response to a changing environment 
through its global gene expression patterns 
will become increasingly powerful. 

Several distinct temporal patterns of ex- 
pression could be recognized, and sets of 
genes could be grouped on the basis of the 
similarities in their expression patterns. The 
characterized members of each of these 
groups also shared important similarities in 
their functions. Moreover, in most cases, 
common regulatory mechanisms could be 
inferred for sets of genes with similar expres- 
sion profiles. For example, seven genes 
showed a late induction profile, with mRNA 
levels increasing by more than ninefold at 



*e lajt-timepoint but less than threefold at 
«he preceding tirnepoint (Fig. 5B). All of 
™se genes were known to be gujeose-re- 
pressed, and five of the seven were previously 
noted to share a common upstream activat- 
ing sequence (UAS), the carbon source re- 
sponse element (CSRE) (16-20). A search 
in the promoter regions of the remaining two 
genes, ACR1 and IDP2, revealed that 
ACRJ, a gene essential for ACS J activity, 
also possessed a consensus CSRE motif, but 
interestingly, IDP2 did noc A search of the 
entire yeast genome sequence for the con- 
sensus CSRE motif revealed only four addi- 
tional candidate genes, none of which 
showed a similar induction- 
Examples from additional groups f 
genes that shared expression profiles are 
illustrated in Fig. 5, C through F. The 
sequences upstream of the named genes in 
Fig. 5C all contain stress response ele- 
ments (STRE), and with the exception 
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f HSP42, have previously been sh wn to 
be c ntr lied at least in part by these 
elements (21-24). Inspection of the se- 
quences upstream of HSP42 and the two 
uncharacterized genes shown in Fig. 5C, 
YKL026c, a hypothetical protein with 
similarity to glutathione peroxidase, and 
YGR043c, a putative transaldolase, re- 
vealed that each of these genes also pos- 
sess repeated upstream copies of the stress- 
responsive CCCCT motif. Of the 13 ad- 
ditional genes in the yeast genome that 
shared this expression profile [including 
HSP30, ALD2, OM45, and 10 uncharac- 
temed ORFs (25)], nine contained one or 
more recognizable STRE sites in their up- 
stream regions. 

The heterotrimeric transcriptional acti- 
vator complex HAP2 t 3,4 has been shown 
to be responsible for induction of several 
genes important for respiration (26-28). 
This complex binds a degenerate consensus 
sequence known as the CCAAT box (26). 
Computer analysis, using the consensus se- 
quence TNRYTGGB (29), has suggested 
that a large number of genes involved in 
respiration may be specific targets of 
HAP2 t 3,4 (30). Indeed, a putative 
HAP2 t 3 t 4 binding site could be found in 
the sequences upstream of each of the seven 
cytochrome c-related genes that showed 
the greatest magnitude of induction (Fig. 
5D). Of 12 additional cytochrome c-related 
genes that were induced, HAP2,3 t 4 binding 
sites were present in all but one. Signifi- 
cantly, we found that transcription of 
HAP4 itself was induced nearly ninefold 
concomitant with the diauxic shift. 

Control of ribosomal protein biogenesis 
is mainly exerted at the transcriptional 
level, through the presence of a common 
upstream-activating element (UAS—) 
that is recognized by the Rapl DNA-bind- 
ing protein (3 J, 32). The expression pro- 
files of seven ribosomal proteins are shown 
in Fig. 5F. A search of the sequences 
upstream of all seven genes revealed con- 
sensus Rapl-binding motifs (33). It has 
been suggested that declining Rapl levels 
in the cell during starvation may be re- 
sponsible for the decline in ribosomal pro- 
tein gene expression (34). Indeed, we ob- 
served that the abundance of RAP1 
mRNA diminished by 4.4-fold, at about 
the time of glucose exhaustion. 

Of the 149 genes that encode known or 
putative transcription factors, only two, 
HAP4 and SIP4, were induced by a factor of 
more than threefold at the diauxic shift. 
SIP4 encodes a DNA-binding transcrip- 
tional activator that has been shown to 
interact with Snfl, the "master regulator" of 
glucose repression (35). The eightfold in- 
duction f S1P4 upon depletion of glucose 
strongly suggests a role in the induction of 



downstream genes at the diauxic shift. 

Although most of the transcriptional 
responses that we observed were n t pre- 
viously known, the responses of many 
genes during the diauxic shift have been 
described. Comparison of the results we 
obtained by DNA microarray hybridiza- 
tion with previously reported results there- 
fore provided a strong test of the sensitiv- 
ity and accuracy of this approach. The 
expression patterns we observed for previ- 
ously characterized genes showed almost 
perfect concordance with previously pub- 
lished results (36). Moreover, the differ- 
ential expression measurements obtained 
by DNA microarray hybridization were re- 
producible in duplicate experiments. For 
example, the remarkable changes in gene 
expression between cells harvested imme- 
diately after inoculation and immediately 
after the diauxic shift (the first and sixth 
intervals in this time series) were mea- 
sured in duplicate, independent DNA mi- 
croarray hybridizations. The correlation 
coefficient for two complete sets of expres- 
sion ratio measurements was 0.87, and for 
more than 95% of the genes, the expres- 



sion ratios measured in these duplicate 
experiments differed by less than a factor 
of 2. H wever, in a few cases, there were 
discrepancies between our results and pre- 
vious results, pointing to technical limita- 
tions that will need to be addressed as 
DNA microarray technology advances 
(37, 38). Despite the noted exceptions, 
the high concordance between the results 
we obtained in these experiments and 
those of previous studies provides confi- 
dence in the reliability and thoroughness 
of the survey. 

The changes in gene expression during 
this diauxic shift are complex and involve 
integration of many kinds of information 
about the nutritional and metabolic state 
of the cell. The large number of genes 
whose expression is altered and the diver- 
sity of temporal expression profiles ob- 
served in this experiment highlight the 
challenge of understanding the underlying 
regulatory mechanisms. One approach to 
defining the contributions of individual 
regulatory genes to a complex program of 
this kind is to use DNA microarrays to 
identify genes whose expression is affected 



Fig. 2. The section of the ar- 
ray indicated by the gray box 
in Fig. i is shown for each of 
the experiments described 
here. Representative genes 
are labeled. In each of the ar- 
rays used to analyze gene 
expression during the diauxic 
shift, red spots represent 
genes that were induced rel- 
ative to the initial tirnepoint. 
and green spots represent 
genes that were repressed 
relative to the initial tirnepoint. 
In the arrays used to analyze 
the effects of the tuplb mu- 
tation and YAPi overexpres- 
son, red spots represent 
genes whose expression was 
increased, and green spots 
represent genes whose ex- 
pression was decreased by 
the genetic modification. Note 
that distinct sets of genes are 
induced and repressed in the 
different experiments. The 
complete images of each of 
these arrays can be viewed on 
the Internet (73). Cell density 
as measured by optical densi- 
ty (OD) at 600 nm was used to 
measure the growth of the 
culture. 
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by mutations in each putative regulatory 
gene, As a test of this strategy, we analyzed 
the genomewide changes in gene expression 
that result from deletion of the TUPl gene. 
Transcriptional repression of many genes by 
glucose requires the DNA-binding repressor 



Migl and is mediated by recruiting the tran- 
scriptional co- repressors Tupl and CycB/ 
Ssn6 (39). Tupl has also been implicated in 
repression of oxygen-regulated, mating-cype- 
specific, and DNA^laniage-inducible genes 
(40). 
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Wild-type yeast cells and cells beanne 
a deletion of the TUPl gene (tupl A) were 
grown in parallel cultures in rich medium 
containing glucose as the carbon source. 
Messenger RNA was isolated from expo- 
nentially growing celb from the two pop- 
ulations and used to prepare cDNA la- 
beled with Cy3 (green) and Cy5 (red), 
respectively (//). The labeled probes were 
mixed and simultaneously hybridized to 
the microanay. Red spots on the microar- 
ray therefore represented genes whose 
transcription was induced in the tupl A 
strain, and thus presumably repressed by 
Tupl (41). A representative section of the 
microarray (Fig. 2, bottom middle panel) 
illustrates that the genes whose expression 
was affected by the tuplA mutation, were, 
in general, distinct from those induced 
upon glucose exhaustion [complete images 
of all the arrays shown in Fig. 2 are avail- 
able on the Internet (13)]. Nevertheless, 
34 (10%) ofthe genes that were induced 
by a factor of at least 2 after the diauxic 
shift were similarly induced by deletion of 
TUPl , suggesting that these genes may be 
subject to TUPl -mediated repression by 
glucose. For example, SUC2, the gene en- 
coding invertase, and all five hexose trans- 
porter genes that were induced during the 
course of the diauxic shift were similarly 
induced, in duplicate experiments, by the 
deletion of TUPL 

The set of genes affected by Tupl in this 
experiment also included a-glucosidases, 
the mating-rype-specific genes MFAJ and 
MFA2, and the DNA damage-inducible 
RNR2 and RNR4, as well as genes involved 
in flocculation and many genes of unknown 
function. The hybridization signal corre- 
sponding to expression of TUPl itself was 
also severely reduced because of the (in- 
complete) deletion of the transcription unit 
in the cup] A strain, providing a positive 
control in the experiment (42). 

Many of the transcriptional targets of 
Tupl fell into sets of genes with related 
biochemical functions. For instance, al- 
though only about 3% of all yeast genes 
appeared to be TUPl -repressed by a factor 
of more than 2 in duplicate experiments 
under these conditions, 6 of the 13 genes 
that have been implicated in flocculation 
(15) showed a reproducible increase in 
expression of at least twofold when TUPl 
was deleted. Another group of related 
genes that appeared to be subject t TUPl 
repression encodes the serine-rich cell 
wall mannoproteins, such as Tipl and 
Tirl/Srpl which are induced by cold 
shock and other stresses (43), and similar, 
serine-poor proteins, the seripauperins 
(44). Messenger RNA levels for 23 ofthe 
26 genes in this group were reproducibly 
elevated by at least 2.5-fold in the tup /A 
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strain, and 18 f these genes were induced 
by m re than sevenf Id when TUP J was 
deleted. In contrast, none of 83 genes that 
could be classified as putative regular rs of 
the cell division cycle were induced more 
than twofold by deletion of TUP J . Thus, 
despite the diversity of the regulatory sys- 
tems that employ Tupl, most of the genes 
that it regulates under these conditions 
fall into a limited number of distinct func- 
tional classes. 

Because the microarray allows us to 
monitor expression of nearly every gene in 
yeast, we can, in principle, use this ap- 
proach to identify all the transcriptional 
targets of a regulatory protein like Tupl. It 
is important to note, however, that in any 
single experiment of this kind we can only 
recognize those target genes that are nor- 
mally repressed (or induced) under the 
conditions of the experiment. For in- 
stance, the experiment described here an- 
alyzed a MAT a strain in which MFAJ 
and MFA2, the genes encoding the a- 
factor mating pheromone precursor, are 
normally repressed. In the isogenic tup] A 
strain, these genes were inappropriately 
expressed, reflecting the role that Tupl 
plays in their repression. Had we instead 
carried out this experiment with a MATA 
strain (in which expression of MFA1 and 
MFA2 is not repressed), it would not have 
been possible to conclude anything re- 
garding the role of Tupl in the repression 
of these genes. Conversely, we cannot dis- 
tinguish indirect effects of the chronic 
absence of Tupl in the mutant strain from 
effects directly attributable to its partici- 
pation in repressing the transcription of a 
gene. 

Another simple route to modulating the 
activity of a regulatory factor is to overex- 
press the gene that encodes it. YAPI en- 
codes a DNA-binding transcription factor 
belonging to the b-iip class of DNA-bind- 
ing proteins. Overexpression of YAPI in 
yeast confers increased resistance to hydro- 
gen peroxide, o-phenanthroline, heavy 
metals, and osmotic stress (45). We ana- 
lyzed differential gene expression between a 
wild-type strain bearing a control plasmid 
and a strain with a plasmid expressing YAPI 
under the comrol of the strong GALMO 
prom ter, both grown in galactose (that is, 
a condition that induces YAPi overexpres- 
sion). Complementary DNA from the con- 
trol and YAP J overexpressing strains, la- 
beled with Cy3 and Cy5, respectively, was 
prepared from mRNA isolated from the two 
strains and hybridized to the microarray. 
Thus, red spots on the array represent genes 
that were induced in the strain overexpress- 
ing YAPJ. 

Of the 17 genes whose mRNA levels 
increased by more than threefold when 



YAPJ was verexpressed in this way, five 
bear homology to aryl-alcohol oxid reduc- 
tases (Fig. 2 and Table 1). An additional 
four of the genes in this set also belong to 
the general class of dehydrogenases/oxi- 
doreductases. Very little is known about 
the role of aryl-alcohol oxidoreductases in 
S. cerevisiae, but these enzymes have been 
isolated from ligninolytic fungi, in which 
they participate in coupled redox reac- 
tions, oxidizing aromatic, and aliphatic 
unsaturated alcohols to aldehydes with the 
production of hydrogen peroxide (46, 47). 
The fact that a remarkable fraction of the 
targets identified in this experiment be- 
long to the same small, functional group of 
oxidoreductases suggests that these genes 



migju pfey an important protective role 
during xidative stress. Transcription of a 
small number of genes was reduced in die 
strain verexpressing Yapl. Interestingly, 
many of these genes encode sugar per- 
meases or enzymes involved in inositol 
metabolism. 

We searched for Yapl -binding sites 
(TTACTAA or TGACTAA) in the se- 
quences upstream of the target genes we 
identified (48), About two-thirds of the 
genes that were induced by more than 
threefold upon Yapl overexpression had 
one or more binding sites within 600 bases 
upstream of the start codon (Table 1), sug- 
gesting that they are directly regulated by 
Yapl. The absence of canonical Yapl-bind- 
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ing sites upstream of the others may reflect 
an ability of Yapl to bind sites that differ 
from the can nical binding sites, perhaps in 
cooperation with ther factors, or less like- 
ly, may represent an indirect effect of Yapl 
overexpression, mediated by one or more 
intermediary factors. Yapl sites were found 
only four times in the corresponding region 
of an arbitrary set of 30 genes that were not 
differentially regulated by Yapl. 

Use of a DNA microarray to character- 
ixe the transcriptional consequences of 
mutations affecting the activity of regula- 
tory molecules provides a simple and pow- 
erful approach to dissection and character- 
ization of regulatory pathways and net- 



works. This strategy also has an important 
practical application in drug screening. 
Mutations in specific genes encoding can- 
didate drug targets can serve as sua gates 
for the ideal chemical inhibitor or modu- 
lator of their activity. DNA microarrays 
can be used to define the resulting signa- 
ture patiern of alterations in gene expres- 
sion, and then subsequently used in an 
assay to screen for compounds that repro- 
duce the desired signature pattern. 

DNA microarrays provide a simple and 
economical way to explore gene expres- 
sion patterns on a genomic scale. The 
hurdles to extending this approach to any 
other organism are minor. The equipment 
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required for fabricating and using DNA 
microarrays (9) consists of components 
that were chosen for their modest cost and 
simplicity. It was feasible for a small group 
to accomplish the amplification of more 
than 6000 genes in about 4 months and, 
once the amplified gene sequences were in 
hand, only 2 days were required to print a 
set of 1 10 microarrays of 6400 elements 
each. Probe preparation, hybridization, 
and fluorescent imaging are also simple 
procedures. Even conceptually simple ex- 
periments, as we described here, can yield 
vast amounts of information. The value of 
the information from each experiment of 
this kind will progressively increase as 
more is learned about the functions of 
each gene and as additional experiments 
define the global changes in gene expres- 
sion in diverse other natural processes and 
genetic perturbations. Perhaps the greatest 
challenge now is to develop efficient 
methods for organizing, distributing, inter- 
preting, and extracting insights from the 
large volumes of data these experiments 
will provide. 
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Human Genome Placed on Chip; Biotech Rivals Put It Up 
for Sale 

By ANDREW POLLACK (NYTJ 1030 words 
The genome on a chip has arrived. 

Melding high technology with biology, several companies are rushing to sell slivers of glass or 
nylon, some as small as postage stamps, packed with pieces of all 30,000 or so known human 
genes. 

The new products will allow scientists to scan all genes in a human tissue sample at once to 
determine : which . genes are active, a job that previously required two or more chips. The whole- 
genome chips will lower the cost and increase the speed of a widely used test that has 
transformed biomedical research in the last few years. 

"It's sort of a milestone event, very similar to generating an integrated circuit of the genome," 
said Stephen P. A Fodor, the chief executive of Affymetrix Inc., the leading seller of gene chips 
which are also called microarrays. 

Affymetrix, based in Santa Clara, Calif., is expected to announce today that it is accepting orders 
for its whole-genome chip. v 6 

The announcement seems timed to steal some thunder from the rival Agilent Technologies 
which is based m nearby Palo Alto. Agilent is to be the host of an analyst meeting today and it 
plans to announce then that it has started shipping test versions of its whole-genome chip. 

Applied Biosystems of Foster City, Calif., a unit of the Applera Corporation, started the race in 
July wi* an announcement that it would have a whole-genome chip out by the end of this year 
NimbleGen Systems, a small company in Madison, Wis., announced a few days later that it had a 
genome on a chip that it was not selling but that it was using to run tests for customers. 

Gene chips, which detect genes that are active, meaning they are being used to make a protein 
have become essential tools. Scientists try to understand the genetic mechanisms of disease by 
seeing which genes are turned on in, say, a sick kidney or lung compared with those active in a 
healthy organ. Pharmaceutical companies look at gene activity patterns to try to predict the 
effects of drugs. r 
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Scientists have found that tumors that look the same under the rmcroscop*can differ in terms of 
which genes are active^ So studying gene patterns could become a way toXriminate be^rf 
deadly and not-so-deadly tumors, or to predict which drug will work best for a particular patimt 

r^oSaT 6 VCnd0rS C ° nCeded ^ Change fr ° m ChiP$ t0 ° nC " m ° re Symb0lic ^ 

•Tou can do just as good science with two chips, it costs you a little more," said Roland Green, 
the vice president for research and development at NimbleGen. 

Some scientists questioned whether the chips really have all human genes, because the exact 
number and identities of all the genes is not known. s*u«c*«.i 

The advent of the genome on a chip is, however, evidence that biotechnology, to the extent that it 
uses electronics is experiencing some of the rapid progress that has made s^conductorTand 
computers continuously cheaper and smaller. =>wiuconauciors ana 

"One of the effects everyone is looking for in the genomics area is Moore's law - more data, less 
money, said Doug Dolgmow, an executive vice president at Gene Logic, which sells data from 
gene chip studies to pharmaceutical companies. "This is a step in that direction." 

monks' hW Stat6S ^ nUmbCr ° ftrznsisi0TS on a semiconductor chip doubles every 18 

l^^^VooS! T * ^ madC Wi * «* Same techni « ues used *> "lake semiconductor 
chips. In the mid-1990 s, the company came out with a set of five chips covering what was then 
known of the human genome^ After the human genome sequence was'virtually comple^h 
2000, the company developed a two-chip set with all the known genes. Now it has the sing" 
chip, which some scientists say will be more convenient. 

7* ? e wi b u aW V° l0 ° k at a " 8CneS at ° ne time to 8 et a 8 lobal view °f what's going on," said 
John R Walker, who runs gene chip operations at the Genomics Institute of the Novarts 
Research Foundation in San Diego. 

Costs should also be lower. Gene chips have been so expensive that many academic scientists 

Zr r ^ thCm - Affymetrix Sa,d h WOuld se » * whole-gcnTe L 

for $300 to $500 each depending on volume, little more than half the price of the two-chip set 
The other companies have not announced prices. 

For Affymetrix a successful whole-genome chip "is essential for them to maintain their 

J?Tf m,C , roarra y s ' said Edw *rd A. Tenthoff, an analyst at U.S. Bancorp Piper 
Jaffray. Affymetrix had total product sales in 2002 of about $250 million, and a company 
spokesman said that human genome chips are its top-selling product. 

Mr Tenthoff, who recommends Affymetrix stock, said the company's sales growth rate had 
moderated as ,t faces tougher competition. Agilent, a spinoff of Hewlett-Packard that makes its 
gene chips by printing DNA components onto glass slides using ink jet printers, has gained 
share, he said. Applied Biosystems, the largest maker of genomics equipment over all will be 
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entering the microarray segment of the business with its whole-genome chip, emphasizing the 
connection of that product to the others it offers, including the gene database developed by its 
sister company, Celera Genomics. 

Jeffrey Trent, scientific director of the Translational Genomics Research Institute in Phoenix, 
said that while whole-genome chips are useful for medical discovery, the biggest growth of the 
market will be for chips that can be used by doctors to do diagnoses. And whole-genome chips 
are too cumbersome for that, he said. Rather, once scientists use the whole-genome chips to find 
particular genes that are associated with, say, tumor aggressiveness or drug effectiveness he 
said, they will then make smaller and cheaper chips containing just those genes for use in' 
diagnosis. 
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Agilent Technologies ships whole human genome on sinale 

microarray to gene expression customers for evaluation . munl 

Company to Introduce first commercia, who.e human microarray by end f^Sl > Corp0ratS 

7 ► Electronic 

PALO ALTO, Calif., Oct. 2, 2003 ► Life Sc ien 
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density fomal which «n accommodate ™ 'oooflS un S on TSZ- J , b . as f<""' W"ew Coubte- 

and high-quality performance make this produS a ^Sro aEm2^ r Pr °. beS - The 0051 savinQS 
own microarrays." compelling alternative for scientists who make their 

Agilent's microarrays are based on the industry-standard 1 - x v /9<;„,m v -« w 

compatible with most commercial microarray manners Al AaL? 2ZL. • T 0rmat ' Which is 

using content from public databases and prentatan -LVr^S ?T C ' a ' m ' croarrav s are developed 

information made available to customers Gene seouenJ^l I"" sec "i ence an ° annotation 

and then validated empirically throu^er^ are d T e : e, ° ped usin9 al 9° rithms 

comprised of functionally validated P^beVwim^hl mniffJn ,« P i°f edU ? S - The result is a microarray 

information commercially available up-to-date and comprehensive genome 

Advantages of the double-density format include: 

* «2Zm£Z£Z£^ ^ "» » -I— «— -gents and 

• So.*, sampie use. A sn,ato of 5 , mple „,,«„, „ quired ,o perto™ en expen-nen.. 
Availability 

Agilenfs Whole Human Genome Microarray is expected to be avai.ab.e for order by the end of the year. 
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end of 2003) that invo ve nsks and uncertainties that could cause results to differ material^™ 
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( Website ) 

SANTA CLARA, Calif., Oct. 2 /PRNewswire/ - Affymetrix i„ e 
(Nasdaq: AFFX) announced today that it is t .n„« * * ' 
GeneChip(R, brand Human Genome^ Plu. 2 0 teV o«"Ji? 

protein-coding content of the h uma n Ar «y, offering researchers the 

catalog microlrray ?he HG^U133 TusTTlrrV C ™ rCially availa ^ 
of nearly 50,000 RNA transcripts and varia^tfwith sTdi^ ^""i" 1 level 
transcript, providing superior data guSty unmatched bv Itl^™*** 
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The third enactment of Cambridge 
Healthtech Institute's Macroresults 
through Microarrays meeting was held 
in Bost n (MA, USA) from 29 April- 
1 May 2002. The subtheme of this year's 
meeting was 'advancing drug discov- 
ery', a widely touted application for 
array technology. 



The evolution of microarrays 
If you were asked 'Who first conceived 
f the idea of microarrays', who would 
come to mind? Mark Schena perhaps, 
first author of the seminal 1995 paper 
on cDNA arrays [1]? Maybe Pat Brown, 
Schena's then supervisor? Or perhaps 
Stephen Fodor, the primary driver 
behind Affymetrix's (h up. 7/ www. 
affymetrtx.com) oligonucleotide-based 
platform [2]. Brits might even chant the 
name of Ed Southern [3]. Well, accord- 
ing to Roger Ekins (University College 
London Medical School; http://www. 
ucl.ac.uk/medicine/) all these answers 
would be wrong. It was in fact Ekins 
and his colleagues who first conceived 
of and patented 'a new generation of 
ultrasensitive, miniaturized assays for 
protein and DNA-RNA measurement 
based on the use of microarrays' in the 
mid 1 980s [4], The concept and poten- 
tial f array technology was more fully 
described in a later publication, in 
which Ekins et oL [5] concluded that an* 
tib dy microspots of -50 u.m* could be 
achieved, and that as many as 2 million 
different immunoassays could, in prin- 
ciple, be acc mmodated on a surface 
area of 1 cm 2 . 

Technological inn vati n 

In practice, it took a different biol gical 

m lecule (DNA), a different research 



group, and a leap into microfabri- 
cation technology to even begin 
approaching these kinds of densities 
[Affymetrix patent 6045996 talks of 
one million spots cm* 2 ]. Of course, 
advancing technology is one of the 
driving engines behind the genomics 
juggernaut, and we are already seeing 
'4th generation' machines for fab- 
ricating DNA chips. If the company 
representatives at this meeting are to 
be believed (and their cases seemed 
strong), spotting is out, and in situ 
fabrication of oligonucleotide-based 
'iterative custom arrays' is in. Whether 
you go with the Combimatrix's (http:// 
www.combimatrix.com) electrochemi- 
cally directed synthesis and detection 
system, febit's (http://www.febit.com) 
Geniom® technology, or Nimblegen's 
(http://www.nimblegen.com) Maskless 
Array Synthesizer technology is a 
matter of personal choice. However, 
each of these machines provides the 
flexibility to design variable length 
oligonucleotide probes from se- 
quences inputted by the user, and then 
perform in situ synthesis of an array. 
Each system also boasts unique advan- 
tages. For example, Combimatrix's 
biological array processor is a semi- 
conductor coated with a 3D layer 
of porous material in which DNA, 
RNA, peptides or small molecules 
can be synthesized or immobilized 
within discrete test sites, while febit's 
Ceniom One® is a fully integrated 
gene-expression analysis system with 
minimal user hands-on time - the 
probe sequences are programmed, the 
RNA samples inserted, and the gene 
expression data is pumped out a few 
hours later. 



Cell- and tissue-based arrays 
Array technology is in most people's 
minds firmly linked with gene-expression 
profiling. Fewer are aware that cell- and 
tissue-based arrays have been devel- 
oped, and how they can provide 
a vital extra dimension to research. In 
support of this, Barry Bochner gave an 
update on the cell-based array system 
that Biolog (http://www.biolog.com) 
has produced for simultaneously mea- 
suring the effects of one gene in the cell 
under thousands of growth conditions 
(see [6] for further details). David Walt 
(Tufts University; http://www.tufts. 
edu/) is developing single live cell ar- 
rays using optical imaging fiber (OIF) 
technology. An array of microwells is 
fabricated on the face of an OIF at den- 
sities of up to 10 million wells cm--. 
Cells are then added to the wells and 
disperse at an average of one cell per 
well. Physiological and genetic re- 
sponses of each cell are measured via 
fluorescence produced by reporter 
genes (e.g. tad, gfp. Assays performed 
so far include yeast live or dead cell 
assay, microenvironment pH and 
0 2 measurements, promotor responses 
using the lad and phoA reporter genes, 
and protein-protein interactions using 
the yeast two-hybrid system. The main 
advantage of this system is that the cells 
remain alive during the assay, which 
means a real-time timecourse can be 
performed and/or the array passed 
from sample to sample. This would be 
useful in, for example, the scanning of 
a combinatorial drug library f r specific 
physiological effects. 

Tissue arrays are a useful complemen- 
tary technology t DNA arrays because 
they can be used to help validate and 
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understand the biological and medical 
significance of gene changes discov. 
ered using standard DNA arrays. For 
example,* an array of tumor tissues can 
be screened for the protein (using im- 
munohistochemistry), message (using 
in situ hybridization) and copy number 
(using comparative genomic hybridiza- 
ti n) of a gene of interest, to determine 
if expression of the gene (or lack 
thereof) is related in any way to sur- 
vival. They can also be used to predict 
the probability of clinical failure of lead 
compounds as a result of toxicity by 
evaluating the distribution of the drug 
targets in normal tissue. Spyro Mousses 
and his co-workers at the National 
Human Genome Research Institute 
(http://www.nhgri.nih.gov/index.html) 
have built such arrays, including a 
multi-tumor array (-5000 specimens, 
and sections from 36 normal and 800 
metastatic tissues) and a normal tissue 
array (76 tissue and 332 cell types). 

The problem with proteins 
It has been said that genomics tells us 
what might happen, transcriptomics 
indicates what should happen, and pro- 
teomics shows what is happening. The 
impact of functional proteomics on 
pharmaceutical R&D is rapidly increas- 
ing, and protein arrays are being used 
increasingly in both basic and applied 
research. Their use lies not only in com- 
parative protein expression and inter- 
action profiling, but also in diagnostics 
and drug discovery. However, an in- 
creasing number of researchers have 
found that protein arrays, like their 
cousins the DNA arrays, present several 
practical obstacles relating to their pro- 
duct n and use. For example, in using 
Escherichia coli to produce recombi- 
nant eukaryotic proteins from a single 
expression vector, multiple protein 
products are often pr duced, suggest- 
ing mixes of truncated r therwise 
altered pr teins. There is als the obvi- 
ous concern that the proteins might 
not be modified in a similar manner to 



eukaryotic systems. Also, an optimal 
method for depositing and binding 
proteins to the selected substrate is 
yet to be determined, as is the best 
way to ensure that they are bound in a 
correctly folded, active conformation. 

Several companies have been address- 
ing these problems. Prolinx (http:// 
www.prolinxinc.com) is one such com- 
pany, and Karin Hughes described their 
Versalinx™ chemistry for producing 
protein, peptide and small-molecule 
arrays. Versalinx™ uses solution-phase 
conjugation followed by immobiliza- 
tion, resulting in functional orientation 
of proteins and peptides on the sub- 
strate surface. It also offers the valuable 
additional benefit of exhibiting low 
non-specific binding. Sense Proteomic 
(http://www.senseproteomic.com) is 
also among those addressing these 
problems to develop robust protein 
arrays for drug discovery and clinical 
applications and has developed func- 
tional protein array formats based on 
specific disease tissues. Subtractive hy- 
bridization is used to identify genes 
with altered expression in breast tumor 
and cystic fibrosis compared to normal 
tissue. A high throughput cloning strat- 
egy (COVET™) is then used to produce 
libraries of genes that are tagged, 
cloned, expressed, purified and finally 
immobilized on glass slides. Initial vali- 
dation studies have shown that the vast 
majority of the immobilized proteins do 
indeed retain biological function. 

Stefan Schmidt and his company 
(CPC Biotech; http://www.gpcbiotech. 
de) have moved past the platform devel- 
opment stage and, with their focus 
firmly on drug discovery, are currently 
developing kinase-profiling arrays. 
Kinases are important targets for phar- 
maceutical drug discovery and therapy, 
and GPC's aim is to simultaneously de- 
tect multiple kinases, obtain activity pro- 
files for different cell types, or analyze 
the ability of drug candidates to inhibit 
kinase activity. To do this, recombinant 
kinase substrates are immobilized on 



membranes, incubated with purified 
kinase, and the- substrates measured f r 
the degree of phosphorylation. 

Summary 

Meetings like this, packed with exciting 
discoveries and intriguing and interest- 
ing innovation, heavily emphasize the 
pace at which biotechnology is advanc- 
ing, to the extent that the number of 
options for genomic and proteomic re- 
searchers can become overwhelming. 
Although data analysis is perhaps the 
greatest current concern for array users, 
an increasing challenge will be to deter- 
mine the approaches and technology 
that really work, and to do it in a timely 
manner. 

References 

1 Schena, M. et at. (1995) Quantitative 
monitoring of gene expression patterns 
with a complementary DNA microarray. 
Science 270, 467-470 

2 Fodor, S.R et aL (1991) Light -directed, 
spatially addressable parallel chemical 
synthesis. Science 251, 767-773 

3 Southern, E.M. et al. (1992) Analyzing and 
comparing nucleic acid sequences by 
hybridization to arrays of oligonucleotides: 
evaluation using experimental models. 
Genomics 13, 1008-1017 

4 Ekins, R.P. (1987) US Patent Application 8 
803 000 

5 Ekins, R. et at. (1989) High specific activity 
chemiluminescent and fluorescent markers: 
their potential application to high 
sensitivity and 'multi-analyte' 
immunoassays. /. Bioium. Chemilum. 4, 
59-78 

6 Rockett, J.C. (2002) Chip, chip, array! Three 
chips for posi-genomic research. Drug 
Discov. Today 7, 458-459 

Acknowledgements 
I would like to thank Mary Ann Brown 
(Cambridge Healthtech Institute) and 
David Dix (US EPA) for critical review 
of this manuscript prior to submission. 
This document has been reviewed in 
accordance with US Environmental 
Protection Agency policy and ap- 
proved for publication. Mention of 
companies, trade names or products 
does not signify endorsement of such 
by the EPA. 



wvAv.druadiKOverytoday.com 805 



titinopkowti 1901, «. 907-930 

* N. Leigh Anderson 

Ricardo Esqoer-Blasc 
Jean-Paul Hofroann 
Norman G. Anderson 



(5> NOTICE: THIS MATERIAL MAY BE PROTECTED 

3V COPYWGHT LAW (TITLE 17 U.S. CODE) , *»^f 

' A two-dimensional gel database of raUiver proteins 
useful in gene regulation and drug effects studies 



907 



4 Large Scale Biology Corporation, 

Rockville, MD 

Lai et al., 09/002,485, filed December 31, 1997 
(PF-0459) 

Exhibit "L* sttached to Declaration of John C. 
Rockett, Ph.D. 



^Sj W °" dimcnji0DaJ {2 " D) protein ma P of Fis *« 344 rat liver 
(F-^MSTj) is presented, with a tabular listing of more than 1200 protein snecies 
Sc > dium dodecyl sulfate (SDS) molecular mass and isoelectric SSSiSSi 
uwished, based on positions of numerous internal standards. This map has been 

S£ 5 « n H- eCl C r° mpar u hUnd , redS °9' D gcls of ral livcr «»Ple« from a va- 
riety cf studies, and forms the nucleus of an expanding database describing rat 

5r«,5u r0le t n 5 *" d lh fi regulat , ion ^^ous drugs and toxic agents. An example 

£ Src 5 Ud> ' T° VI ? ^'TJ 3 ° f Ch0,eSlcro1 s >' mhcsis by cholesterol-lower- 
ng crugs anc a hi E n-cholesterol diet, is presented. Since the map has been or> 
Lined with a wioely used and highly reproducible 2-D gel system (the Iso-Dalt* 
system), it can be directly related to an expanding body of work in other laborato- 



Contents 

1 Introduction onj 

2 Materia] and methods 0 Qg 

2.1 Sample preparation oQg 

2.2 Two-dimensional electrophoresis 909 

2.3 Staining 909 

2.4 Positional standardization 909 

2.5 Computer analysis 909 

2.6 Graphical data output ojq 

2.7 Experiment LSBC04 S10 

3 Results and discussion 910 

3.1 The rat liver protein 2-D map *** 910 

3.2 Carbamylated charge standards computed phs 
and molecular mass standardization 91 ] 

3.3 An example of rat liver gene regulation: Chol- 
esterol metabolism 91 1 

3.3.1 MSN 413 (putative cytosolic HMG-CoA 

. synthase) and sets of spots regulated co- 
ordinately or inversely 9j I 

3.3.2 MSN 235 and corregulated spots.....*!* 912 

3.3.3 An example of an anti-synergistic effect 912 

3.3.4 Complexity of the cholesterol synthesis 
pathway 912 

4 Conclusions 912 

5 References 912 

6 Addendum 1: Figures 1-13 \ [ 914 

7 Addendum 2: Tables 1-4 * * * ' 923 

Table 1. Master table of proteins in rat liver data- 
base 923 

Table 2. Table of some identified proteins 928 

Table 3. Computed pPs of two sets of carbamylated 
protein standards: rabbit muscle CPK and 

human Hb 929 

Table 4. Computed p/'s of some known proteins re- 
lated to measured CPK pfs 930 



Conetpond*D«: Dr. N. Leigh Anderson. Large Scale Biology Corpora- 
tion, 9620 Medical Center Drive, RockviJJe, MD 20850, USA 

AbbrtYUiions: CBB, Coomassie Brilliant Blue; CPK, creatine phospho- 
kinase; two-dimensional; IEF, isoelectric focusing; MSN, master 
spot number; NF-40, Nonidet P-40.SDS, sodium dodecyl sulfate 

VCH VerUp|«eIUctuft tnbH. D-6940 W e jnhetm. 1991 



1 Introduction 

High-resoiution two-dimensional electrophoresis of pro- 
teins, introduced in 1975 by OTarrell and others [1-4], has 
been used over the ensuing 16 years to examine a wide va- 

T Y ?L b n 0jC ? ^ s y slems > results appearing in more 
than 5000 published papers. With the advent of computer- 
ized systems for analyzing two-dimensional (2-D) gel ima- 
ges and constructing spot databases, it is also possible to 
plan and assemble integrated bodies of information de- 
scribing the appearance and regulation of thousands of pro- 
tein gene products [5, 6]. Creating such databases involves 
amassing and organizing quantitative data from thousands 
of 2-D gels, and requires a substantial commitment in tech- 
nology and resources. 

Given the long-term efTort required to develop a protein da- 
tabase, the choice of a biological system takes on consider- 
able importance. While in vitro systems are ideal foranswer- 
ing many experimental questions, especially in cancer re- 
search and genetics, our experience with cell cultures and 
tissue samples suggests that some in vivo approaches could 
have major advantages. In particular, we have noticed that 
liver tissue samples from rats and mice appearto show grea- 
ter quantitative reproducibility (in terms of individual pro- 
tein expression) than replicate cell cultures. This is perhaps 
a natural result of the homeostasis maintained in a com- 
plete animal vj. the well-known variability of cell cultures 
the latter due principally to differences in reagents (eV 
fetal bovine serum), conditions (e.g., pH) and genetic "evo^ 
lution"of cell lines while in culture. It is also more difficult 
to generate adequate amounts of protein from cell culture 
systems (particularly with attached cells), forcing the inves- 
tigator to resort to radioisotope-based or silver-based stain- 
oeieciion methods. While these methods are more sensi- 
tive (sometimes much more sensitive) than the Coomassie 
Brilliant Blue (CBB) stain typically used for protein detec- 
tion in "large" protein samples, they are generally more vari- 
able, more labor-intensive and, in the case of radiographic 
methods, may generate highly "noisy" images, due to the 
properties of the films used. By contrast, large protein sam- 
ples can easily be prepared from liver using urca/Nonidet 
P-40 (NP-40) solubilization and stained with CBB, which 
has the advantage of being easily reproducible (8]. Finally 
there remains the question of the "truthfulness" of many in 
vitro systems as compared to their in vivo analogs; how 
great are the changes caused by the introduction into a cul- 
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turc and the associated shift to strong selection for growth, 
and how do these affect experimental outcomes? Hence 
the apparent advantages of fa vitro systems, in terms of ex- 
perimental manipulation, may be counterbalanced by 
other factors relating to 2-D data quality. 

There is a second important class of reasons for exploring 
the use of an in vivo biological system such as the liver. His- 
torically there have been two broad approaches to the me- 
chanistic dissection of biochemical processes in intact cel- 
lular systems: genetics (a search for informative mutants) 
and the use of chemical agents (drugs and chemical toxins). 
Both approaches help us to understand complex systems 
bv disrupting some specific functional element and show- 
ing us the result. With the development of techniques for 
genetic manipulation and cloning, the genetic approach 
can be effectively applied either in vitro or in v/vo, although 
the in vitro route is usually quicker. The chemical approach 
can also be applied to either sort of biological system; here, 
however, the bulk of consistently acquired information is 
in experimental animals (rats and mice). While most biolo- 
gists know a short list of compounds having specific, experi- 
mentally useful effects (f.g., inhibitors of protein synthesis, 
ionophores, polymerase inhibitors, channel blockers, nu- 
cleotide analogs, and compounds affecting polymerization 
f cytoskeletal proteins), there is a much larger number of 
interesting chemically-induced effects, most of them char- 
acterized by toxicologists and pharmacologists in rodent 
systems. Just as a thorough genetic analysis would involve 
saturating a genome with mutations, it is possible to ima- 
gine a saturating number of drugs, the analysis of whose ac- 
tions would reveal the complete biochemistry of the cell. 
While organized drug discovery efforts usually target spe- 
cific desired effects, the nature of the process, with its de- 
pendence on screening large numbers of compounds, ne- 
cessarily produces many unanticipated effects. It is there- 
fore reasonable to suppose that the required broad range of 
compounds necessary' to achieve "biochemical saturation" 
may be forthcoming; in fact, it may already exist among the 
hundreds of thousands of compounds that failed to qualify 
as drugs. 

Among organs, the liver is an obvious choice for the study 
of chemical effects because of its well-known plasticity and 
responsiveness. The brain appears to be quite plastic {e.g. 
[7]), but it is a complicated mixture of cell types requiring 
skillful dissection for most experiments. The kidney, while 
quite responsive, also presents a potentially confounding 
mixture of cell types. The liver, by contrast, is made up of 
one predominant cell type which is easy to solubili2e: the 
hepatocyte, representing more than 95% of its mass. Most 
importantly, the liver performs many homeostatic func- 
tions that require rapid modulation of gene expression. It 
appears that most chemical agents tested affect gene ex- 
pression in the liver at some dosage (N. Leigh Anderson, 
unpublished observations), an interesting contrast to our 
earlier work with lymphocytes, for example, which seem to 
be much less responsive. Such results conform to the expec- 
tation that cells with a homeostatic, physiological role 
should be more plastic than cells differentiated for a pur- 
pose dependent on the action of a limited number of spe- 
cific genes. 

The liver also allows the parallels between in vitro and in 
vivo systems to be examined in detail. Significant progress 
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has been made* in the development of mouse, rat and hu- 
man hepatocyte culture systems^as well as in precision-cut 
tissue slices. Using such an array of techniques, it is possi- : 
ble to assemble a matrix of mammalian systems including 
mouse and rat in vivo on one level and mouse, rat and hu- 
man in vitro on a second level, and to compare effects be- 
tween species and between systems. This approach allows 
us to draw informed conclusions regarding the biochemical 
"universality" of biological responses among the mammals, 
and to offer some insight into the validity of in vitro ap- 
proaches for toxicological screening. We believe this data 
will be necessary if in vitro alternatives are to achieve wide 
usage in government-mandated safety testing of drugs, con- 
sumer products and industrial and agricultural chemicals. 

A number of interesting studies have been published using 
2-D mapping to examine effects in the rodent liver. A num- 
ber of investigarors have made use of the technique to 
screen for existing genetic variants (8-11] or induced muta- 
tions [12-141, mainly in the mouse. This work builds on the 
wealth of genetic information available on the mouse and 
its established position as a mammalian mutation-detec- 
tion system. While some studies of chemical effects have 
been undertaken in the mouse [15-17], most have used the 
rat [18-23]. The examination of the cytochrome p-450 sys- 
tem, in particular, has been carried out almost exclusively 
on the rat [24, 251. 

These considerations lead us to conclude that rodent liver 
offers the best opportunity to systematically examine an 
array of gene regulation systems, and ultimately to build a 
predictive model of large-scale mammalian gene control. 
The basic underlying foundation of such a project is a reli- 
able, reproducible master 2-D pattern of liver, to which on- 
going experimental results can be referred. In this paper, we 
report such a master pattern for the acidic and neutral pro- 
teins of rat liver (pattern F344MST3). In future, this master 
will be supplemented by maps of basic proteins, and analog- 
ous maps of mouse and human liver. 



2 Materials and methods 
2.1 Sample preparation 

Liver is an ideal sample material for most biochemical stud- 
ies, including 2-D analysis. A sample is taken of approxima- 
tely 0.5 g of tissue from the apical end of the left lobe of the 
liver. Solubilization is effected as rapidly as practical; a 
delay of 5-15 min appears to cause no major alteration in 
liver protein composition if the liver pieces are kept cold 
(e.g., on ice) in the interim. In the solubilization process, 
the liver sample is weighed, placed in a glass homogenizer 
{e.g., 15 mLWheaton); 8 volumes of solubilizing solution* 

• The solubilizing solution is composed of 2%NP-40 (Sigma), 9 m urea 
(analytical grade, e.g., BDH or Bio-Rad), 0.5% dithiothreitol (DTT; 
Sigma) and 2% carrier ampholytes (pH 9-11 LKB: these come as a20% 
stock solution. so 2 % final concentration is achieved by making the final 
solution 10% 9-11 Ampholine by volume). A large batch of solubiliicf 
(several hundred mL) is made and stored froien at -80 °C in aliquot! 
sufficient to provide enough Tor one day's estimated sample prepara- 
tion requirement. The solution is never allowed to become warmer 
than room temperature at any stage during preparation or thawing for 
use. since heating of concentrated urea solutions can produce contami- 
nants that covalenily modify proteins producing anifactual charge 
shifts. Once thawed, any unused solubilirer is discarded. 
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is added (Le. f < mL per 0.5 g tissue} and ihe mixiure is ho- 
mogenized using first the loose- and then then the light-fit- 
ting glass pestle. This takes approximately 5 strokes with 
each pestle and is carried out at room temperature because 
urea would crystallize out in the cold. Once the liver sample 
is thoroughly homogenized in the solubilizer, it is assumed 
that all the proteins are denatured (by the chaotropic effect 
of the urea and NP-40 detergent) and the enzvmes inacti- 
vated by the high pH (-9.5). Therefore these samples may 
be kept at room temperature until they can be centrifuged 
orfro2en as a group (within several hours of preparation) 
The samples are centrifuged for 6 X 10* $min (e.g., 500 000 
X g for 12 min using a Beckman TL-100 centrifuge) The 
centrifuge rotor is maintained at just below rocm tempera- 
ture (e.g., 15-20°C), but not too cold, so as to prevent the 
precipitation of urea. The centrifuge of choice is a Beckman 
TL-100 because of the sample tube sizes available, bu t anv 
ultracentrifuge accepting smallish tubes wijj suffice When 
an appropriate centrifuge is not available near the site of 
sample preparation, samples can be frozen at -S0°C and 
thawed prior to centrifugaiion and collection of superna- 
tants. Each supernatant is carefully removed following cen- 
trifugaiion and aliquoted into at least 4 clean tubes for stor- 
age. This is done by transferring all the supernatant to one 
clean tube, mixing this gently (to assure homogeneous 
composition) and then dividing ii into 4 aJiquots. The aJi- 
quots are frozen immediately at -S0°C. These multiple ali- 
quots can provide insurance against a failed run or a freezer 
breakdown. 

2.2 TVo- dimensional electrophoresis 

Sample proteins are resolved by 2-D electrophoresis usine 
the 20 X 25 cm Iso-Dalt* 2-D gel system ([26-29]- pro- 
duced by LSB and by Hoefer Scientific Instruments' San 
Francisco) operating with 20 gels per batch. All first-dimen- 
sional isoelectric focusing (IEF) gels are prepared using the 
same single standardized batch of carrier ampholvtes 
(BDH 4-3A in the present case, selected by LSB's ba'tch- 
testing program for rat and mouse database work**). A 10 
uL sample of solubilized liver protein is applied to each gel 
and the gels are run for 33 000 to 34500 volt-hours using a 
progressively increasing voltage protocol implemented by 
a programmable high-voltage power supply. An Ange- 
lique* computer-controlled gradient-casting system (pro- 
duced by LSB) is used to prepare second-dimensional sod- 
ium dodecyl sulfate (SDS) polyacrylamide gradient Mab 
gels in which the top 5 % of the gel is 1 ] %T acrvlamide and 
the lower 95% of the gel varies linearly from l'l % to 18%T 
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Zn ni h .h r °T lhc Slab * cls withou * equilibra. 
lion and held in place by poiyesier fabric wedges (Wed- 
g>es- produced by ISB) to avoid the use of faofaga^t 
Second-d^ensionaJ slab gels are run overnight, in £££ 
of 20, in cooled DAU tanks (10'C) with buffer circulation 

^^P™'?!^™*'*} SOUrce *** 101 formation, 
end notations of delation from expected results are ente- 
red by the technician responsible on a detailed, multi-page 
record of the experiment. 5 

23 Staining 

Following SDS-electrophoresis, slab gels are stained for 
protein using a colloidal Coomassie Blue G-250 procedure 

n V v * 0X ' 111,5 P roce dure (based on the work 

of Neuhoff[30,31]) involves fixation in 1.5 L of 50% etha- 
no] and 2% phosphoric acid for 2h, three 30 min washes, 
each in 2 L of cold tap water, and transfer to 1.5 L of 34% 
methano 17% ammonium sulfate and 2 % phosphoric acid 
for 1 h, followed by the addition of a gram of powdered Coo- 
massie Blue G-250 stain. Staining requires approximately 4 
cays to reach equilibrium intensity, whereupon eels are 
transferred to cool tap water and their surfaces rinsed to re- 
move any paniculate stain prior to scanning. Gets may be 
kept for several months in water with added sodium azide 
The water washes remove ethanol that would dissolve the 
stain (and render the system noncolloidal, with high back- 
grounds). The concentrated ammonium sulfate and meth- 
anol solution is diluted by equilibration with the water vol- 
ume of the gels to automatically achieve the correct final 
concentrations for colloidal staining. Practical advantages 
or this staining approach can be summarized as follows- m 
the low, flat background makes computer evaluation of 
small spots (max OD < 0.02) possible, especially when 
using laser densitometry; (ii) up to 1500 spots can be reli- 
ably detected on many gels (e.g., rat liver) at loadings low ■ 
enough to preserve excellent resolution; and (iii) reprodu- 
cibility appears to be very good: at least several hundred 
spots have coeflicients of reproducibility less than 15% 
Tbis value is at least as good as previous CBB methods, and 
significantly better than many silver stain systems. 

2.4 Positional standardization 



This system has recently been modified so as to employ a 
commercially available 30.8 %T acrylamide/A^A^-methyle- 
nebisacrylamide prepared solution (thus avoiding the han- 
dling of the solid acrvlamide monomer) and three addi- , c ^ 
tionai stock solutions: buffer (made from Sigma pre-set Computer analysis 

Tris), persulfate and AW^A'-tetramethylethvlenedi- 
amine (TEMED). Each, gel is identified by a computer- 
printed filter paper label polymerized into the lower left cor- 
ner of the gel. First-dimensional JEF tube gels are loaded 



The carbamylated rabbit muscle creatine phosphokinase 

nifu* 5 [ ] 3re P urchased from Pharmacia and 

bDH. Amino acid compositions, and numbers of residues 
present in proteins used for internal standardization are 
taken from the Protein Identification Resource (PIR) se- 
quence database [33]. K J 



" This material (succeeding certified batches of which are available from 
Hoefer Scientific instruments) has the most linear pH gradient pro- 
duced by any ampholyte tested except for the Pharmacia wide range 
(which has an unacceptable tendency to bind high-molecular weight 
acidic proteins, causing them to streak). 



Stained slab gels are digitized in red light at 134 micron re- 
solution, using either a Molecular Dynamics laser scanner 
(with pixel sampling) or an Eikonix 78/99 CCD scanner 
Raw digitized gel images are archived on high-density DAT 
tape (or equivalent storage media) and a greyscale video- 
print prepared from the raw digital image as bard-cony 
backup of the gel image. Gels are processed using the Kep- 
ler* software system (produced by LSB), a commercially 
available workstation-based software package built on 
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'somcoT IRc principles oTthe*eaTlfelTYCHO~sysiem [34- 
41]. Procedure PROC008 is used lo yield a spotlist giving 
position, shape and density information for each detecied 
spot. This procedure makes use of digital filtering, mathe- 

' matical morphology techniques and digital masking to re- 
move thebackground,and uses full 2-D least-squares opti- 
mization to refine the parameters of a 2-D Gaussian shape 
for each spot. Processing parameters and file locations are 
stored in a relational database, while various log files detail- 
ing operation of the automatic analysis software are ar- 
chived with the reduced data.The computed resolution and 
level of Gaussian convergence of each gel are inspected 
and archived for quality control purposes. 

Experiment packages are constructed using the Kepler ex- 
periment definition database to assemble groups of 2-D 
patterns corresponding to the experimental groups {e.g., 
treated and control animals). Each 2-D pattern is matched 
to the appropriate "master" 2-D pattern (pattern 
F344MST3 in the case of Fischer 344 rat liver), thereby 
providing linkage to the existing rodent protein 2-D data- 
bases. The software allows experiments containing hun- 
dreds of gels to be constructed and analyzed as a unit, with 
up to 100 gels displayed on the screen at one time for com- 
parative purposes and multiple pages to accommodate ex- 
periments of > 1000 gels. For each treatment, proteins 
showing significant quantitative differences vs. appropriate 
controls are selected using group-wise statistical parame- 
ters (e.g., Student's t-test, Kepler* procedure STUDENT). 
Proteins satisfying various quantitative criteria (such as P< 
0.001 difference from appropriate controls) are repre- 
sented as highlighted spots onscreen or on computer-plot- 
ted protein maps and stored as spot populations (/.?., logi- 
cal vectors) in a liver protein database. Quantitative data 
(spot parameters, statistical or other computed values) are 
stored as real-valued vectors in the database. Analysis of co- 
regulation is performed using a Pierson product-moment 
correlation (Kepler procedure CORREL) to determine 
whether groups of proteins are coordinately regulated by 
any of the treatments. Such groups can be presented graphi- 
cally on a protein map, and reported together with the statis- 
tical criteria used to assess the level of coregulation. Multi- 
variate statistical analysis (e.g., principal components' ana- 
lysis) is performed on data exported to SAS (SAS Institute). 

2.6 Graphical data output 

Graphical results are prepared in GKS and translated 
within Kepler* into output for any of a variety of devices. 
Linedrawing output is typically prepared as Postscript and 
printed on an Apple LaserWriter. Detailed maps presented 
here have been generated using an ultra-high-resolution 
Postscript-compatible Linotronic output device. Greyscale 
graphics are reproduced from the workstation screen using 
a Seikosha videoprinter. Patterns are shown in the standard 
orientation, with high molecular mass at the top and acidic 
proteins to the left. 

2.7 Experiment LSBC04 

In the study described here 12-week-old Charles River 
male F344 rats were used. Diets were prepared at LSB, 
based on a Purina 5755M Basal Purified Diet. Lovastatin 
and cholestyramine were obtained as prescription pharma- 



ceuticals,"gfouhd and mixed with the diet at concentrations 
of 0.075% and 1%, respectively; The high cholester 1 diet 
was Purina 5801M-A (5% cholesterol plus 1 % sodium cho- 
late in the control diet). Animal work was carried out by Mi* 
crobiological Associates (Bethesda,M~D), Animals were ac- 
climatized for one week on the control diet, fed test or con- 
trol diets for one week, and sacrificed on day 8. Average 
daily doses of lovastatin and cholestyramine in appropriate 
groups were 37 mg/kg/day and 5 g/kg/day, respectively, 
based on the weight of the food consumed. Liver samples 
were collected and prepared for 2-D electrophoresis accord- 
ing to the standard liver protocol (homogenization in 8 
volumes of 9 m urea, 2% NP-40, 0.5% dithiothreitol, 2% 
LKB pH 9-11 carrier ampholytes, followed by centrifuga- 
tion for 30 min at 80000 X g). Kidney, brain and plasma 
samples were frozen. Gels were run as described above, 
and the data was analyzed using the Kepler 11 system. Gels 
were scaled, to remove the effect of differences in protein 
loading, by setting the summed abundances of a large num- 
ber of matched spots equal for each gel (linear scaling). 



3 Results and discussion 

3.1 The rat liver protein 2-D map 

F344MST3 is a standard 2-D pattern of rat liver proteins, 
based on the Fischer 344 strain. This pattern was initiated 
from a single 2-D gel and extensively edited in an experi- 
ment comparing it to a range of protein loads, so as to in- 
clude both small spots and well-resolved representations of 
high-abundance spots. More than 700 rat liver 2-D patterns 
have been matched to F344MST3 in a series of drug effects 
and protein characterization experiments, and numerous 
new spots (induced by specific drugs, for instance) have 
been added as a result. A modified version including addi- 
tional spots present in the Sprague-Dawley outbred rat has 
also been developed (data not shown). Figure 1 shows a 
greyscale representation and Fig. 2 a schematic plot of the 
master pattern. More than 1200 spots are included, most of 
which are visible on typical gels loaded with 10 jiLof solubi- 
lized liver protein prepared by the standard method and 
stained with colloidal Coomassie Blue. Master spot num- 
bers (MSN's) have been assigned to all proteins, and ap- 
pear in the following figures, each showing one quadrant of 
the pattern. Figure 3 shows the upper left (acidic, high 
molecular mass) quadrant, Fig. 4 the upper right (basic, 
high molecular mass) quadrant, Fig. 5 the lower left (acidic, 
low molecular mass) quadrant, and Fig. 6 the lower right 
(basic, low molecular mass) quadrant. The quadrants over- 
lap as an aid to moving between them. The gel position (in 
100 micron units), isoelectric point (relative lo the CPK in- 
ternal p7standards) and SDS molecular mass (from the cali- 
bration curve in Fig. 8) are listed for each spot (Table ^.Be- 
cause of the precision of the CPK-p7 values, these parame- 
ters can be used to relate spot locations between gel sys- 
tems more reliably than using pi measurements expressed 
as pH. A major objective of current studies is the identifica- 
tion of all major spots corresponding to known liver pro- 
teins, as well as rigorous definitions of subcellular orga- 
nelle contents. Of particular interest to us is the parallel de- 
velopment of identifications in the rat and m use liver 
maps, allowing detailed comparisons of gene expression ef- 
fects in the two systems. The results of these studies will be 
presented systematically in a later edition of this database, 
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but we include here a useful series of 22 orienting identifi- 
cations as an aid to other users of the rat liver pattern (Table 

3 2 Carbamylated charge standards, computed p/s and 
molecular mass standardization 

We have previously shown that the use of a system of close- 
ly-spaced internal p/ markers (made by carbamylating a 
basic protein) ofTers an accurate and workable solution to 
the problem of assigning positions in the pi dimension [32J. 
The same system, based on 36 protein species made by car- 
bamylating rabbit muscle CPK, has been used here to as- 
sign p/'s to most rat liver acidic and neutral proteins. The 
standards were coelectrophoresed with total liver proteins, 
and the standard spots added to a special version of the 
master pattern F344MST3. The gel ^-coordinates of all 
liver protein spots lying within the CPK charge irain were 
then transformed into CPK p/ positions by interpolation 
between the positions of immediately adjacent standards 
(Table 1) using a Kepler* vector procedure. 

It has proven possible to compute fairly accurate p/ values 
for many proteins from the amino acid composition [42]. 
We have attempted here to test a further elaboration of this 
approach, in which we computed pfs for the CPK standards 
themselves, based on our knowledge of the rabbit muscle 
CPK sequence and the fact that adjacent members of the 
charge train typically differ by blockage of one additional ly- 
sine residue (Table 3). We compared these values to similar 
computed p/'s for an additional set of carbamylated stand- 
ards made from human hemoglobin beta chains and a se- 
ries of rat liver and human plasma proteins of known posi- 
tion and sequence (Fig. 7,Table 4).The result demonstrates 
good concordance between these systems. Two proteins 
show significant deviations: liver fatty-acid binding protein 
(FABP; #1 in Table 4) and protein disulphide isomerase 
(#20 in the table). The FABP spot present on F344MST3 
may represent a charge-modified version of a more basic 
parent spot closer to the expected p/, not resolved in the 
IEF/SDS gel. Of particular importance is the fact that, by 
comparing computed p/'s of sequenced but unlocated pro- 
teins with the CPK p/'s, we can assign a probable gel loca- 
tion without making any assumptions regarding the actual 
gel pH gradient. This offers a useful shoncut, given the va- 
garies of pH measurement on small diameter 1EF gels. We 
have used this approach to compute the CPK p/'s of all rat 
and mouse proteins in the PIR sequence database, as an aid 
to protein identification (data not shown). 

In order to standardize SDS molecular weight (SDS-MW), 
we have used a standard curve fitted to a series of identified 
proteins (Fig. 8). Rather than using molecular mass perse, 
we have elected to use the number of amino acids in the 
polypeptide chain, as perhaps a better indication of the 
length of the SDS-coated rod that is sieved by the second 
dimension slab. The resulting values were multiplied by 
112 (the weighted average mass of amino acids in se- 
quenced proteins) to give predicted molecular masses. Be- 
cause we use gradient slabs, we have not constrained the fit- 
ted curve to conform to any predetermined model; rather 
we tried many equations and selected the best using the 
program "Tablecurve'on a PC. The equation chosen was> 
=a + fcc+ c/r ,whereyis the number of residues,* is the gel 
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rcoordinate, a is 51 1.83, b is -=02731 and cis 33183801 Tlic 
resulting fit appears to be fairly good over a broad ranee of 
molecular mass. 



23 An example of rat liver gene regulation: Cholesterol 
metabolism 

Experiment LSBC04 was designed as a small-scale test of 
the regulation of cholesterol metabolism in vivo by three 
agents included in the diet: lovasiatin (Mevacor*,an inhibi- 
tor of HMG-CoA reductase); cholestyramine (a bile acid 
secuestrant that has the effect of removing cholesterol 
from the gut-liver recirculation); and cholesterol itself. The 
first two agents should lower available cholesterol and the 
third should raise it, allowing manipulation of relevant 
gene expression control systems in both directions. Such 
an experiment offers an interesting test of the 2-D mapping 
system since most of the pathway enzymes are present in 
low abundance, many are membrane-bound and difficult 
to solubilize,and the pathway itself is complex. Approxima- 
tely 1000 proteins were separated and detected in liver ho- 
mogenates. Twenty-one proteins were found to be affected 
by at least one treatment, and these could be divided into 
several coregulated groups. 

3.3.1 MSN 413 (putative cytosolic HMG-CoA synthase) 
and sets of spots regulated coordinated or inversely 

One group of spots (including a spot assigned to the cyto- 
solic HMG-CoAsynthase,MSN 413) showed the expected 
increase in abundance with lovasiatin or cholestyramine, 
the synergistic further increase with lovasiatin and choles- 
tyramine, and a dramatic decrease with the high cholesterol 
diet. Spot number 413 is the most strongly regulated pro- 
tein in the present experiment, showing a 5- to 10-fold in- 
duction after a 1 week treatment with 0.075 % lovastatin and 
1% cholestyramine in the diet (Figs. 9 and 10). Its expres- 
sion follows precisely the expectation for an enzyme whose 
abundance is controlled by the cholesterol level; it is pro- 
gressively increased from the control levels by cholestyra- 
mine, lovastatin and lovasiatin plus cholestyramine, and it 
sinks below the threshold of detection in animals fed the 
high cholesterol diet. This spot has been tentatively identi- 
fied as the cytosolic HMG-CoA synthase, based on a reac- 
tion with an antiserum to that protein provided by Dr. Mi- 
chael Greenspan at Merck Sharp & Dohme Research Labo- 
ratories. This enzyme lies immediately before HMG-CoA 
reductase in the liver cholesterol biosynthesis pathway,and 
is known to be co-regulated with it. Spot 413 has an SDS 
molecularweight of about 54 000 and a CPK p/of-11.4, in 
reasonably close agreement with a molecular weightof 
57300 and a CPK p7 of-15.7 computed from the known se- 
quence of the hamster enzyme [43], 

Using a classical product-moment correlation test (Kepler 
procedure CORREL), a series of five additional spots was 
found to be coregulated with 413. The level f correlation 
was exceedingly high (> 95%). Two of these, 1250 and 933, 
are at similar molecular weights and approximately one 
charge more acidic than 413 (Fig. 9), indicating that they 
may be covalently modified forms of the 413 polypeptide. 
This suspicion is strengthened by the observation that both 
spots are also stained by the antibody to cytosolic HMG- 
CoA synthase. The remaining three correlated spots appear 
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to comprise an additional related pair (1253 and 3001) of 
around 40 kDa and a single spot (1139) of around 28 kDa. 
Because these two presumed proteins are present at sub- 
stantially lower abundances than 413, and because the cyto- 
solic HMG-CoA synthase is reported to consist of only one 
type of polypeptide, they are likely to represent other, very 
tightly coreguiated enzymes. A second group of six spots 
was selected based on a regulator)' pattern close to the in- 
verse of that for spot 413 (MSN's 34,79, 178, 3 £2, 204,347; 
data not shown). For these proteins, the lowest level of ex- 
pression occurs with exposure to lovastatin plus cholestyra- 
mine and the highest level upon exposure to the high-cho- 
lesterol diet. Spots 1£2 and 79 are highly correlated and lie 
about one charge apart at the same molecular weight; they 
may thus be isoforms of a single protein. The other four 
spots probably represent additional enzymes or subunits. 

3 32 MSN 235 and coreguiated spots 

A third group of five spots, mainly comprised of mitochon- 
drial proteins including putative mitochondrial HMG- 
CoA synthase spots, showed a modest induction by lovasta- 
tin alone, but little or no effect with any of the other treat- 
ments (including the combination of lovastatin and choles- 
tyramine; Fig. 12).This result is intriguing because lovasta- 
tin was expected to affect only the regulation of enzymes of 
cholesterol synthesis, which is entirely extra-mitochon- 
dria). Three of the spots (235, 134, 144) form a closely- 
packed triad at approximately 30 kDa, and are likely to re- 
present isoforms of one protein. AH three spots are stained 
by an antibody to the mitochondrial form of HMG-CoA 
synthase obtained from Dr. Greenspan. Subcellular fractio- 
nation indicates a mitochondrial location. The other two 
spots (633 at about 38 kDa and 724 at about 69 kDa) are 
each present at lower abundance than the members of the 
triad. 

333 An example of »d anti-synergistic effect 

A sixth spot (367) shows strong induction by lovastatin 
(two- to threefold), and about half as much induction with 
lovastatin plus cholestyramine, but without sharing the ani- 
mal-animal heterogeneity pattern of the 235-set (Fig. 13). 
This protein is also mitochondrial, and represents the clear- 
est example of an anti-synergistic effect of lovastatin and 
cholestyramine. The existence of such an effect demon- 
strates that lovastatin and cholestyramine do not act exclu- 
sively through the same regulatory pathway. 

3 3A Complexity of tbe cholesterol synthesis pathway 

Taken together, these results suggest that treatment with lo- 
vastatin alone can affect both cytosolic and mitochondrial 
pathways using HMG-CoA, while cholestyramine, on the 
other hand, either alone or in combination with lovastatin, 
produces a strong effect on the putative cytosolic pathway, 
but little or no efTect on the putative mitochondrial path- 
way. An explanation for this difference may lie in lovasta- 
tin's effect on levels of HMG-CoA and related precursor 
compounds that are exchanged between the cytosol and 
the mitochondrion, whereas cholestyramine should affect 
only the cytosolic pathways directly controlled by cholester- 
ol and bile acid levels.lt remains to be explained why some 



proteins of the putative mitochondrial pathway are so 
much more variable in their expression in all groups. An ex- 
amination of all the coreguiated groups suggests that quan- 
titative statistical techniques can extract a wealth of inter- 
esting information from large sets of reproducible gels. The 
abundance of spots in the 413 coregulation group t for exam- 
ple, shows an amazing level of concordance in their relative 
expression among the five individuals of the lovastatin and 
cholestyramine treatment group. This efTect is not due to 
differences in total protein loading, since they have already 
been removed by scaling, and since proteins with quite dif- 
ferent regulation patterns can be demonstrated (e.g.. Fig. 
13). Such effects raise the possibility that many gene coregu- 
lation sets may be revealed through the study of a suffi- 
ciently large population of control animals (i.e., without 
any experimental manipulation). This approach, exploiting 
natural biological variation in protein expression instead of 
drug effects, offers an important incentive for the construc- 
tion of a large library of control animal patterns. 



4 Conclusions 

Because of the widespread use of rat liver in both basic bio- 
chemistry and in toxicology, there is a long-term need for a 
comprehensive database of liver proteins. The ratliverraas- 
ter pattern presented here has proven to be an accurate re- 
presentation of this system, having been matched to more 
than 700 gels to date. As the number of proteins identified 
and tbe number of compounds tested for gene expression 
effects grows, we expect this database to contribute valu- 
able insights into gene regulation. Its practical utility in sev- 
eral areas of mechanistic toxicology is already being de- 
monstrated. 

Received September 11, 1991 
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Figure 8. Ploiof numberof amino acids versus gel /-position, with fitted 
curve used to predict molecular mass of unidentified proteins. 



•25 _L -15 I -5 

CPK position 



4 //* i/« Z (a) Plot of computed isoelectric point versus gel ^position for 
two sets of carbamylated standard proteins (rabbit muscle CPK f+J and 
human hemoglobin 0 chain, filled diamonds) and several other proteins 
(shaded squares), (b) The identities of the various proteins represented 
by the squares are indicated by the numbers in corresponding positions 
on (a); these refer to Table 4. 
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^tirr P. Montage showing elTeca in the 
region ofMSN:413.The montage shows a 
small window into one portion of the2-D 
pattern, one row of windows for each expe- 
rimental group, and one panel for each gel 
in the experiment. The left-most pattern 
in each row is a group-specific copy of the 
master pattern followed by the patterns 
for the five individual rats in the group. 
The highlighted protein spots (filled circ- 
les) are spot 413 (on the right of each pan- 
el; identified as cytosolic HMG-CoA syn- 
thase) and two modified forms of it (1250 
and 933). From the top, the rows (experi- 
mental groups) are: high cholesterol, con- 
trols, cholestyramine, lovastatin,and lova* 
statin plus cholestyramine. 
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Figure 10, Bargraph showing the quantita- 
tive e fleets of various treatments on the 
abundance orMSN:413 (cytosolic HMG- 
CoA synthase) in the gels of Fig. 9. 
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Figure J J. Bargraphs of a series ofsix core- 
gulated spots including MSN:413. In the 
bargraphs, the abundances of the appro- 
priate spot (master spot number shown at 
the top of the panel) in each animal are 
shown. The five five-animal groups are in 
the order (left to right): high cholesterol, 
controls, cholestyramine, lovastatin. and 
lovastatin plus cholestyramine. Each bar 
within a group represents one experimen- 
tal animal liver (one 2-D gel). Note the cor- 
related expression of the 6 spots, espe- 
cially in the two far right (most strongly in- 
duced) groups. 





Figure IS. Data on spot MSN:367, presented as in Fig. 11. This protein 
shows unambiguously the anti-synergisiic effect of lovastalin and cholei* 
tyramine (fifth group) as compared to lovastatin (fourth group). This res* 
ponse contrasts strongly with the regulation pattern seen in Fig. 11. 
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MSN 


X 


Y 


3 


311 


434 


5 


568 


263 


8 


812 


426 


11 


548 


268 


IS 


645 


520 


17 


628 


588 


16 


806 


414 


16 


755 


298 


20 


648 


403 


21 


1204 


448 


22 


332 


434 


23 


787 


424 


24 


313 


417 


25 


807 


516 



CPKDl sosmw 



MSN 



Y CPKot SDSMW 



27 1184 

28 1263 



29 
30 



743 
768 



32 1216 

-33 1145 

34 1037 

35 863 



36 
38 
39 



712 
763 
304 



41 1165 

42 664 

43 1316 

44 1924 

46 1203 

47 1391 



48 
49 

50 
51 



309 
605 
621 
1113 



52 1620 

53 725 

54 2001 

55 722 

56 678 

57 1662 

58 1091 

59 1171 

60 1400 

61 1853 

62 1688 

65 735 

66 1263 

67 1252 

68 779 

69 1064 

71 656 

72 638 

73 1562 

74 1570 

75 1264 

76 1338 

77 1633 
76 1767 

79 925 

80 534 

61 1811 

62 1412 
83 1471 

64 1662 

65 1596 

66 1817 
87 



524 
446 

605 
112 
417 
445 
555 
412 
606 
694 
470 
568 
607 
589 
362 
586 
447 
454 
587 
535 
522 
499 
177 
500 
830 
533 
302 
580 
565 
624 
506 
567 
297 
312 
407 
682 
296 
569 
545 
563 
556 
621 
564 
363 
565 
738 
698 
363 
681 
347 
563 
479 
301 



88 1589 
69 1706 



90 
91 



516 1371 
698 
719 



651 
1415 



92 1773 

93 1338 

94 1708 



329 
710 
545 
446 
696 



«-35.0 
-24.3 
•16.0 
-25.2 
-15.3 
-21.6 
-14.0 
-17.5 
-20.9 
-8.7 
<-35.0 
-16.6 
«-35.0 
-16.1 
-9.0 
-8.0 
-17.8 
-17.2 
-8.6 
-6.5 
-11.3 
-14.9 
-18.7 
-17.3 
c-35.0 
-9.2 
-18.6 
-7.3 
-0.1 
-8.7 
-6.3 
<-35.0 
•22.5 
•21.8 
-10.0 
-0.9 
-18.3 
>0.0 
-18.4 
-18.8 
-2.5 
-10.3 
-9.2 
-6.2 
-0.6 
-0.4 
-18.1 
-6.0 
-6.1 
-16.8 
•10.6 
•20.6 
-21.2 
-3.6 
-3.8 
-6.0 
•7.0 
-0.8 
•1.5 
-13.6 
-26.1 
-1.0 
-6.0 
-5.0 
-2.7 
-3.4 
-0.9 
27.0 
-3.5 
-2.2 
20.6 
-6.0 
-1.4 
-7.0 
-2.2 



63,800 
102,900 
64,600 

101,000 
55,200 
50.000 
66 v 300 
90,200 
67,900 
62,100 
63,600 
65,000 
66.000 
55,500 
54,900 
62,400 
49.000 
346.600 
66,000 
62,500 
52.400 
66.600 
46,900 
43,600 
59.800 
51.400 
46,800 
50.000 
74,600 
50.200 
62.300 
61.500 
50,100 
53.900 
55.000 
57,000 
170.800 
56,900 
37.300 
54.100 
89.000 
50,600 
50.300 
47.600 
56.200 
51,500 
90.500 
65.900 
67.300 
43.900 
90,600 
50,000 
53,100 
50,400 
52.300 
46.000 
51.800 
74.400 
51.700 
41.600 
43.600 
74.500 
44.500 
77.500 
51.800 
56.900 
89.100 
17,400 
43,600 
42.500 
61,700 
43.000 
53.200 
62.300 
43.700 



95 
96 
97 
96 
99 
100 
101 
102 
103 
104 
105 
106 
107 
108 
109 
110 
111 
113 
114 
115 
116 
117 
116 
120 
121 
122 
123 
124 
125 
126 
127 
128 
129 
130 
131 
132 
133 
134 
135 
136 
137 
136 
139 
140 
141 
142 
143 
144 
145 
146 
147 
146 
146 
150 
151 
152 
153 
154 
155 
156 
157 
156 
159 
160 
161 
162 
164 
166 
167 
166 
169 
170 
171 
172 
173 



1119 
1731 
1033 
1406 
578 
2004 
1106 
462 
665 
773 
312 
1769 
1565 
1692 
1482 
778 
1728 
1191 
1296 
662 
1146 
1548 
1050 
1530 
636 
1572 
23 
621 
1296 
672 
1000 
1229 
1422 
1776 
1930 
660 
666 
1271 
1161 
453 
1656 
1504 
1486 
1689 
311 
1366 
1429 
615 
2006 
2006 
1070 
1347 
541 
1645 
1269 
1507 
1722 
932 
1031 
1970 
1256 
1275 
1663 
1034 
1953 
1020 
1566 
1905 
1340 
1506 
1336 
1969 
600 
476 
919 



536 
756 
566 
565 
1149 
536 
623 
455 
630 
1182 
1117 
509 
720 
607 
593 
516 
700 
660 
165 
907 
610 
649 
577 
628 
423 
712 
1433 
1474 
662 
921 
717 
311 
832 
499 
757 
537 
1019 
662 
1386 
1063 
823 
697 
707 
756 
1417 
915 
346 
1017 
566 
516 
1108 
578 
1461 
760 
236 
911 
446 
503 
294 
684 
183 
417 
820 
527 
771 
1482 
606 
565 
181 
563 
678 
541 
378 
956 
1314 



•9.9 
•2.0 
-11.4 
-6.1 
-23.8 
>0.0 
-10.1 
•26.5 
•20.2 
-17.0 
<*35.0 
-1.5 
-3.6 
•2.4 
-4.8 
•16.9 
-2.0 
-6.9 
-7.5 
-19.6 
-9.5 
-4.1 
•11.1 
-4.3 
-15.4 
•3.6 
<-35.0 
-21.8 
-7.5 
-14.7 
•12.0 
-6.4 
•5.8 
-1.4 
-0.1 
-20.4 
-20.2 
-7.9 
-9.3 
-29.7 
-0.6 
-4.6 
-4.8 
•2.4 
<-35.0 
-6.7 
-5.7 
•22.1 
>0.0 
>0.0 
-10.7 
•6.9 
-25.7 
-2.6 
-7.9 
-4.5 
-2.1 
•13.5 
-11.4 
>0.0 
•6.1 
-7.8 
•2.6 
-11.4 
>0.0 
-11.6 
-3.8 
-0.2 
-7.0 
-4.6 
-7.0 
>0.0 
-16.3 
-26.7 
•13.7 



53.800 
40.700 
51,600 
51,700 
25,000 
53.700 
47,900 
61.300 
37,300 
23.800 
26.100 
56.100 
42,500 
38.300 
49.700 
55.500 
43.500 
44.500 
160.800 
34,100 
48,700 
36,500 
50,800 
37,400 
65.200 
42.900 
15.30C 
13,900 
36,000 
33.50C 
42.600 
86,100 
37.3a 
57,000 
40,700 
53,800 
29,700 
36,000 
16.80C 
26.100 
37,700 
43,700 
43,200 
40,700 
15,800 
33.800 
77,900 
29,800 
51,600 
55.300 
26.500 
50,800 
13,700 
40,500 
117,000 
33.900 
62,100 
56,600 
91,400 
44.400 
162,400 
65,900 
37,800 
54,600 
40,000 
13,700 
38,400 
51,700 
164.900 
50,400 
44.700 
53,500 
71.600 
32.100 
19.300 



MSN 



Y CPKot SOSMW 



174 1364 

175 625 

177 1562 

178 1321 

179 1089 

180 1866 
411 
804 



181 
162 

184 1860 

185 1997 



186 

187 



279 
773 



188 1538 

191 1560 

192 1818 

193 1469 

194 1380 

195 784 

196 1227 

197 667 

198 2006 

199 1711 



200 
201 
202 
203 



872 
292 
736 
786 



204 1224 

205 439 

206 1994 

207 1895 

208 240 
210 1700 



211 
213 
214 



902 
1067 
1340 



215 1591 

216 1585 

217 1159 

218 931 



219 
220 
221 
223 
225 
226 



713 
1479 

965 

934 
1812 

621 



227 1586 

228 1065 

229 1577 

230 1456 



232 
234 
235 
236 
237 
238 
239 
240 
241 
242 
243 
244 
245 
246 
247 
246 
249 
250 
251 
252 
253 
254 
255 
256 
257 
256 



1440 

1692 
618 
920 
952 
1611 
1489 
501 
1620 
1357 
711 
1855 
1169 
551 
1348 
460 
1733 
1974 
808 
874 
753 
995 
1690 



163. 
393 

553 
710 
615 
567 
295 
730 
696 

1017 

1113 

296 

807 

674 

687 

555 

266 

632 
1185 

553 

681 

674 

424 



829 
569 
983 
571 
667 
1418 
499 
517 
684 
668 
495 
755 
393 
572 
177 
911 
927 
716 
1045 
411 
1463 
567 
890 
496 
849 
469 
1004 
1138 
1008 
541 
720 
448 
569 
658 
1182 
621 
474 
459 
604 
448 
451 
788 
392 
553 
646 
450 
679 



-6.7 
-15.7 
-3.6 
-7.2 
-10.4 
-0.5 
-32.1 
-16.2 
-0.6 
>0.0 
<-35.0 
-17.0 
-4.2 
-3.9 
-0.9 
-5.0 
-6.4 
-16.7 
-8.4 
-20.1 
>0.0 
•2.2 
-14.7 



435 <-35.0 
253 -18.0 



994 1006 
508 464 
1517 820 



-16.7 
-8.5 
-30.9 
>0.0 
-0.3 

<-35.0 
•2.3 
-14.1 
-10.4 
-7.0 
■3.5 
-3.6 
-9.3 
-13.5 
-18.7 
-4.8 
•12.8 
-13.5 
-1.0 
•15.8 
•3.6 

•10.8 
-3.7 
-5.2 
-5.5 
-2.4 

-22.0 

-13.7 

-13.1 
-3.2 
-4.8 

-27.7 
-0.9 
-6.6 

-18.7 
-0.6 
•6.9 

-25.1 
-6.9 

•29.3 
•1.9 
>0.0 

-16.1 

-14.6 

-17.6 

•12.1 
•2.4 

•12.1 

-27.4 
-4 4 



162.900 
60.300 
52.600 
43,000 
46.300 
51.600 
91,200 
42,000 
34.500 
29.600 
26.300 
90.800 
38.400 
44.900 
44.200 
52,400 
101.600 
47.300 
23.700 
52,600 
44.500 
44.900 
65.000 
63,700 
107.800 
37.400 
50.000 
31.100 
51.300 
44,200 
15.800 
57.000 
55.400 
44,400 
45.200 
57.300 
40.700 
69.300 
51.200 
170.500 
33.900 
33.300 
42.700 
28.800 
66,800 
13.600 
51,600 
34,800 
57,300 
36.500 
57.900 
30,300 
25,400 
30.200 
53,500 
42.500 
62.100 
51.400 
45.800 
23.800 
46.000 
59.300 
61.000 
49,100 
62.100 
61.800 
39.200 
69.500 
52.500 
36.500 
61.900 
44.600 
30.200 
60.400 
37.800 
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259 1706 061 

260 661 1961 

261 1725 

262 406 

263 1063 
265 1300 
266 



267 
266 

260 1044 
270 2010 



670 

1127 
172 
673 

510 437 
660 1038 
430 



271 
272 



274 1292 

275 1350 
276 
277 
278 
270 
281 



061 
606 
853 
422 
966 
712 
500 

1670 1069 



857 
895 



538 
718 
570 



688 
061 
870 
1848 1064 

282 1505 525 

283 1313 1147 
1314 



284 

285 1332 

286 1277 
268 1301 
280 1147 



290 
201 



025 



820 
406 
652 
824 
570 
511 



292 1462 

293 531 
204 860 
295 1162 



787 1476 
816 



296 218 

297 1377 



299 
300 2012 
702 



609 
814 
070 

013 1523 



667 
178 



301 
302 
303 

304 1843 1585 

305 1040 

306 1606 



404 1280 
403 1006 



503 



-1.1 
-20.4 
-2.0 

-28.0 
-10.0 
4.3 
-27.3 
-20.4 
-31.0 
-11.2 
>0.0 
-15.0 
-14.2 
-7.6 
4.0 
•2.6 
•10.4 
-13.0 
-14.5 
-0.7 
-4.6 
-7.3 
-7.3 
-7.1 
-7.8 
-6.3 
-9.5 
-13.6 
•16.6 
-5.1 
•26.3 
-14.0 
-0.3 
<-35.0 
4.5 
-13.0 
>0.0 
•10.0 
-28.1 
-32.6 
-0.7 
-11.1 
-3.3 



307 


1210 


016 


4.5 


308 


1627 


755 


4.0 


300 


1524 


602 


-4.4 


310 


1760 


1028 


•1.5 


311 


1600 


1451 


4.3 


312 


266 


1406 


<-35.0 


313 


1002 


1365 


-0.3 


314 


1316 


1305 


•7.3 


315 


1341 


523 


-7.0 


318 


1104 


1053 


-10.1 


320 


1480 


1450 


-4.0 


321 


850 


603 


-15.1 


322 


1454 


1404 


-5.3 


323 


670 


626 


-20.0 


324 


655 


101 


-20.6 


325 


1521 


675 


•4.4 


326 


1587 


677 


•3.6 


327 


1388 


400 


4.3 


328 


448 


1201 


-30.0 


330 


1608 


751 


4.3 


331 


1566 


607 


-3.8 


332 


531 


471 


-26.3 


333 


784 


1156 


-16.7 


334 


1050 


407 


-10 s 


335 


1503 


303 


-3.5 


336 


1616 


598 


•3.2 


338 


1854 


1004 


-0.6 


330 


1265 


688 


4.0 


340 


581 


585 


-23.6 


341 


1497 


1047 


-4.7 


343 


1351 


265 


4.6 


344 


1813 


540 


-0.0 



31.000 
17,700 
44,600 
25.800 
177.400 
45,000 
63.400 
29.000 
31.900 
48,900 
36.300 
65.200 
31,700 
42.900 
49.900 
27,100 
53,700 
42.600 
51.300 
27.300 
54.800 
25.100 
37,400 
67.200 
46.100 
37,600 
50.700 
55.900 
13.900 
37.800 
62.000 
43.600 
48.700 
36,000 
31,300 
12.400 
45,300 
169.200 
20,400 
30,100 
10,300 
49,600 
30,900 
33.700 
40.700 
34.700 
29,400 
14,700 
16,100 
17.600 
16.600 
54.900 
28.500 
14,400 
49,100 
13.300 
47,700 
420.500 
44,800 
44.700 
67.000 
20.100 
40.900 
43.700 
59,600 
24.700 
67,300 
86.500 
49,400 
30.300 
34.900 
50.300 
28.700 
102.200 
52.800 
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345 
346 
347 
348 
349 
350 
351 
352 
353 
354 
355 
356 
357 
358 
359 
360 
361 
362 
363 
364 
365 
366 
367 
366 
369 
370 
371 
372. 
373 
374 
375 
376 
377 
378 
379 
381 
382 
383 
384 
385 
386 
387 
388 
389 
390 
391 
392 
393 
394 
395 
396 
397 
399 
400 
401 
403 
404 
405 
406 
409 
410 
411 
412 
413 
415 
416 
417 
418 
419 
420 
421 
422 
423 
424 
425 



1006 
1095 
625 
361 
110 
521 
912 
1574 
961 
706 
1450 
1374 
474 
796 
764 
1384 
1713 
1161 
914 
412 
741 
£78 
1560 
963 
434 
639 
1567 
1875 
1351 
1506 
1623 
254 
1409 
621 
1017 
953 
856 
1252 
1699 
1042 
1490 
1554 
1193 
1374 
1456 
718 
1799 
1482 
1227 
1530 
1410 
912 
1465 
1473 
1029 
1516 
1495 
1525 
723 
650 
1501 
936 
350 
1033 
737 
1578 
646 
1695 
725 
1289 
1171 
599 
929 
739 
1490 



578 
640 
728 
963 
1343 
1130 
619 
530 
912 
762 
830 
1152 
9S7 
346 
338 
1068 
769 
859 
1156 
435 
486 
1503 
935 
520 
441 
610 
860 
762 
1059 
715 
532 
417 
583 
494 
595 
598 
674 
258 
1518 
493 
583 
603 
404 
902 
969 
690 
732 
758 
1461 
577 
755 
256 
1063 
450 
1140 
754 
554 
1092 
252 
663 
478 
1057 
1120 
538 
425 
606 
496 
482 
770 
1041 
912 
162 
856 
625 
965 



-11.9 
-10.3 
-21.7 
45.3 
<-35.0 
•26.7 
•13.9 
■3.7 
•12.9 
•18.9 
-5.3 
4.5 
-2B.7 
•16.3 
-17.3 
4.4 
-2.1 
-9.3 
•13.6 
-32.0 
•17.9 
-14.6 
-3.9 
•12.4 
-31.0 
-21.2 
•3.6 
■0.5 
4.8 
-4.6 
-0.9 
<-35.0 
4.1 
-21 .8 
•11.7 
-13.1 
•15.0 
4.1 
-2.3 
-11.2 
-4.7 
-4.0 
4.9 
4.5 
-5.2 
-18.5 
-1.1 
-4.8 
4.4 
-4.3 
4.0 
•13.9 
-5.0 
-4.9 
-11.5 
-4.4 
-4.7 
-4.3 
-18.4 
•20.8 
-4.6 
-13.4 
•35.9 
-11.4 
-18.0 
•3.7 
•21.0 
-2.3 
-18.3 
-7.7 
-9.1 
-22.8 
•13.6 
-17.9 
-4.7 



50,800 
46,800 
42.000 
31.100 
18.300 
25.700 
48,100 
54.300 
33,900 
40.400 
37.300 
24.900 
30,600 
77.800 
79.400 
27.900 
40,100 
36.100 
24,600 
63,700 
58.200 
13.000 
33,000 
55.200 
63.000 
48.700 
36.100 
40.400 
28,300 
42.700 
54.200 
65,900 
50.400 
57,500 
49.600 
49.400 
44.900 
105.300 
12.500 
57.500 
50.400 
49.100 
67,700 
34,300 
31,700 
44,000 
41,900 
40,600 
14.400 
50.800 
40.800 
106.400 
28,100 
61.900 
25.300 
40,800 
52.500 
27.100 
108.000 
45.500 
59.000 
26.300. 
26,000 
53.700 
64.900 
48.900 
57.300 
58,600 
40,000 
28,900 
33.900 
193,700 
36.200 
47.700 
31.600 
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426 
427 
426 



1296 
810 
1565 



429 125© 

430 1253 



431 
432 
434 



734 
483 
518 



435 1020 

436 1122 

437 1870 

438 435 

439 86 

440 1740 



704 
643 

303 
847 

562 
1426 

433 
1041 
1170 

196 

673 
1102 



-7.« 
-16.0 
4.9 
4.0 
4.1 
•18.1 
-28.5 
-26.0 
-11.6 
-9.8 
-0.5 
-31.0 



441 
443 
446 



590 
743 
601 



847 <-35.0 
544 -1.8 



447 1050 

448 1245 

449 1576 

450 1818 

451 1094 

452 1045 

453 1652 



1571 
335 
668 
926 
1296 
1516 
1021 
440 
802 
894 



454 


1403 


500 


a r f 
45© 


1394 


718 


457 


905 


436 


459 


1038 


581 


460 


1598 


294 


461 


1528 


863 


462 


1098 


1137 


463 


849 


1125 


464 


1614 


1072 


465 


1388 


481 


466 


1 194 


1064 


466 


577 


467 




1140 


888 


470 




524 


471 


1293 


1133 


472 


616 


655 


473 


2009 


299 


474 


1205 


215 


475 


1035 


786 


476 


160 


155 


477 


469 


1370 


478 




662 


479 


1009 


540 


480 


1216 


235 


482 


816 


346 


483 


693 


673 


485 


1608 


1013 


486 


478 


599 


467 


1025 


607 


488 


1045 


1186 


489 


1609 


301 


490 


775 


1289 


491 


692 


178 


492 


1100 


964 


493 


1760 


776 


494 


882 


247 


495 


470 


1258 


496 


494 


1436 


497 


980 


852 


499 


1414 


546 


500 


1234 


1072 


501 


1246 


659 


502 


824 


792 


503 


1246 


1134 


504 


1115 


1407 


505 


1189 


391 


506 


1578 


402 


507 


787 


250 


508 


979 


552 


509 


1153 


619 


510 


1730 


1006 



•22.8 
-17.8 
•16.2 
-11.1 
4.2 
4.7 
-0.9 
-10.3 
>0.0 
•2.8 
4.1 
4.3 
•14.0 
-11.3 
-3.4 
-4.3 
-10.2 
-15.2 
-0.9 
4.3 
4.9 
•23.9 
-9.6 
-1.1 
-7.6 
-21.9 
>0.0 
4.7 
-11.4 
<-35.0 
-28.9 
•22.8 
-11.8 
4.6 
-15.9 
-19.3 
•3.3 
•28.6 
•11.5 
-11.2 
-3.3 
-17.0 
-19.3 
-10.2 
-1.6 
-14.5 
-26.9 
•28.1 
-12.5 
4.0 
4.3 
4.2 
-15.7 
4.2 
-9.9 
4.9 
•3.7 
•16.6 
-12.5 
4.4 
-2.0 



43X0 
36.800 
66.700 
36.600 
51,000 
15.500 
63.000 
26.000 
24.300 
147.600 
45.000 
26.700 
36,600 
53.200 
10.800 
80.100 
45.200 
33.300 
19.800 
12,600 
29.600 
63,100 
38.600 
34.600 
56.900 
42.600 
63.500 
50.500 
91,400 
35.900 
25.400 
25.800 
27.800 
58.700 
27X0 
60.100 
34.900 
54.800 
25,500 
46,000 
89.900 
131.300 
39.200 
207,600 
17,400 
45.600 
53.500 
117.400 
77.800 
44,900 
30.000 
49,300 
48,800 
23.700 
89,200 
20.100 
169,300 
31.800 
39,700 
110,700 
21,200 
15,200 
36,400 
53.100 
27,800 
45.700 
39.000 
25.500 
16.200 
69,700 
66,000 
109.000 
52.600 
48.100 
30.200 
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511 


609 


484 


•16.0 


££.400 


512 


1099 


533 


-10.2 


54.100 


513 


1696 


1034 




29,200 


514 


948 


636 


-13.2 


47.100 


515 


461. 


543 


•28.5 


53.400 


516 


1334 


1044 


-7.1 


26.800 


517 


868 


1021 


-14.8 


29.700 


516 


798 


779 


-16.3 


39.600 


519 


822 


670 


-15.7 


45.100 


520 


632 


165 


-21.5 


189,000 


521 


1332 


630 


-7.1 


37.300 


522 


603 


1104 


-22.6 


26,600 


523 


1190 


309 


-8.9 


86.800 


524 


479 


1226 


•28.6 


22.300 


525 


768 


1066 


-17.2 


28,000 


526 


747 


1016 


-17.7 


29.800 


527 


1170 


231 


-9.2 


119.600 


526 


1502 


542 


-4.6 


53.400 


530 


1728 


620 


-2.0 


48,000 


532 


507 


1011 


-27.4 


30.000 


533 


670 


489 


-14.7 


57,900 


534 


1347 


1065 


-69 


27.300 


535 




Xdtt 


-4.5 


77,800 


536 


306 


OS* 


<-35.0 


46.000 


538 


1851 




-0.7 


44,100 


539 


1463 


QA9 


-5.1 


31.100 


540 


909 


561 


-13.9 


52.000 


541 


625 


289 


-21.7 


93.100 


542 


1164 


198 


-9.2 


146,200 


543 


803 


655 


-16.2 


45.900 


544 


1259 


1143 


-8.0 


25.200 


545 


856 


1526 


-15.0 


12.200 


546 


803 


1071 


-16.2 


27,600 


547 


1162 


274 


-9.3 


98,400 


546 


128 


1321 


<-35.0 


19.000 


549 


1355 


1122 


-6.8 


25,900 


550 


595 


866 


-23.0 


35,800 


552 


1369 


494 


-6.6 


57,500 


553 


992 


405 


-12.2 


67,600 


555 


1125 


410 


-9.8 


66,900 
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557 1477 1030 -4.9 29.300 

558 960 583 -12.5 50,400 

559 700 1109 -19.1 26.400 

560 1028 621 -11.5 48.000 
562 896 794 -14.1 38,900 

564 789 1446 -16.6 14.900 

565 777 766 -16.9 40.200 

566 960 328 -12.5 81,900 

567 1519 611 -4.4 48.600 

569 1212 661 -8.6 45,600 

570 760 594 -17.4 49,700 

571 618 956 -21.9 32.100 

573 1142 771 -9.6 40,000 

574 532 787 -26.2 39.300 

575 771 250 -17.1 109,200 

576 1068 534 -10.8 54,100 

577 822 734 -15.7 41,800 
576 914 754 -13.8 40,800 
579 1064 794 -10.6 38.900 

560 1524 714 -4.4 42,800 

561 1392 783 -6.3 39.400 
582 982 686 -12.4 44.200 

584 1487 672 -4.8 45.000 

585 758 731 -17.4 41.900 

586 687 1152 -19.5 24.900 

587 930 523 -13.5 55.000 
568 1888 774 4.4 38,900 

589 642 485 -21.1 58,300 

590 1317 519 -7.3 55.300 

591 65 1548 <-35.0 11.500 

592 1014 614 -11.7 48,400 

593 732 176 -18.1 172.300 

594 1627 478 -3.0 59.000 

595 1009 1426 -11.8 15.500 



5S6 619 269 .21.9 100.500 

567 1176 461 -9.1 60,700 

598 1465 1044 -5.0 28,800 

599 741 1188 -17.9 23,600 

600 907 402 -14.0 68,000 

601 687 658 -19.5 45,800 

502 712 1138 -18.7 25,400 

503 898 181 -14.1 165.200 
604 783 1461 -16.7 14 400 
«* 736 223 .18.0 125,300 

606 629 273 -21.6 96,700 

607 1064 286 -10.6 94000 

683 503 -14.5 56.700 

609 2012 610 >0.0 48700 

610 1255 903 *.1 34*200 

612 1103 391 .10.1 69*600 

613 778 265 -16.9 102 000 

614 -£24 518 -15.7 55*400 

615 1095 195 -10.3 149*100 
€16 1759 478 -1.6 59 000 

617 994 372 -12.1 72900 

618 751 374 -17.6 72 400 

619 1429 518 -5.7 55300 

620 1050 520 -11.1 js'joo 

621 923 1105 -13.7 26*600 

622 1462 622 -5.1 47*900 

623 759 225 -17.4 124*000 

624 758 1038 -17.4 29*000 

625 1438 606 -5.5 48*900 

626 1096 1089 -10.2 27*200 

627 942 548 -13.3 S3 000 

628 809 621 -16.0 48 000 

629 899 979 -14.1 31*300 

630 1135 1 321 -9.6 19*100 

631 979 615 -12.5 48*300 

632 1 542 1076 -4.1 27*600 

633 1345 614 -6.9 38*000 

634 409 950 -32.2 32*400 

635 1165 704 -9.2 43 300 

636 774 604 -170 49 000 

637 1 263 524 -8.0 54*800 

638 952 411 -13.1 66*700 

639 1717 575 -2.1 51*000 

640 994 292 -12.1 92*000 

641 165 1 224 <-35.0 22*400 

642 603 251 -16.2 1 06 900 

643 719 296 -18.5 90 700 

644 1100 294 .10.2 91*400 

645 534 1 263 -26.1 21,000 

646 1153 1038 -9.4 29 000 

648 1246 204 -8.2 140 000 

649 14 1406 <-35.0 16 200 

650 1713 1 049 -2.1 28 600 

651 1986 1183 >0O 23800 

652 1378 816 -6.5 38000 

653 1442 1165 -5.5 24 400 

654 650 806 -20.8 38 400 

655 1111 551 -10.O 52.700 

656 1095 861 -10.3 36 000 

657 1524 540 .4.4 53*600 
656 1777 860 -1.4 36000 

659 391 584 -33 4 50*400 

660 977 565 -12.5 51>00 

661 658 1 66 -20,5 187,500 

662 732 312 -18.1 86,100 

663 1787 567 -1.2 5 1.500 

664 888 268 -14.4 100 900 

665 689 775 -14.3 sg'soo 

666 715 221 -18.6 126,300 

667 781 227 -16.8 122 400 

668 646 165 -21 0 189 100 

669 1116 353 -9.9 76*300 

670 1382 643 -6.4 46*600 

671 547 789 -25.3 39!200 
673 984 746 -12.4 41.200 
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674 1661 446 -2.7 62.100 

III » MA £q£ 

676 708 642 .18.6 46 700 

€77 919 615 -13.7 48*300 

678 1065 551 -10.5 52.700 

679 600 923 -22.7 33 400 

680 1237 10O4 -8.3 30*300 

681 1103 283 -10.1 95*100 

682 1406 477 -6.1 59*100 

683 1596 249 -3.4 109*800 



•24.8 




-9.2 




0.0 




-4.1 


AO «m 

**».! CO 




40,300 


.1 1 A 


32.300 


■vfl ft 


100,200 


-lO.U 


34,900 




14,400 


>0.0 


37.800 


■3.0 


45,900 


-13.6 


107.000 


-0.6 


42,700 


>0.0 


78.000 


•13.0 


51.800 


-4.2 


42,000 


-23.8 


34.400 


-3.2 


51.900 


-7.8 


51.200 


-0.7 


43.300 


-11.7 


16.900 


-10.7 


25.100 


<-35.0 


34.800 


-18.5 


66,600 


X A 


36.800 


-7.1 


103,100 


.10 ^ 


63.900 


.16 ft 
- 1 w.O 


58,700 




43.600 


•23.9 


Alt AfV\ 


4& 


140.400 


•10.8 


60.400 


-7.9 


56,400 


-13.0 


37,700 


-17.3 


69.100 


-18.5 


33,700 


-4.9 


66,200 


-0.7 


59,400 


-27.3 


39,400 


•8.6 


25.800 


-0.6 


42.300 


-20.2 


40.300 


-7.2 


85.900 


-18.5 


64.600 


-10.2 


59,500 


-6.7 


51.400 


-19.2 


127,600 


•19.5 


67,000 


-8.7 


106.200 


•12.1 


51.900 


•14.1 


49,500 


-14.5 


165.900 



>u.u 44 200 

745 726 168 -18.3 183*600 

746 999 643 -12.0 46 600 

748 182 1503 <-35.0 1 3*000 

749 2005 649 >0.0 46*300 

750 1448 575 -5.4 51*000 

751 792 266 -16.5 101*900 

752 469 296 -28.9 90*600 

754 664 254 -20.3 1 07*000 

755 1195 184 -8.8 161*000 

756 1821 1113 -0.9 26*300 

757 909 246 -13.9 1 1l'oOO 
760 790 133 -16.5 264.900 
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761 1390 733 

763 1416 1065 

764 2020 569 

765 651 475 

766 1052 1149 

767 1966 468 

768 1330 

769 IS 70 

770 857 

771 1337 
773 1576 



685 
613 
617 
974 
602 
824 
708 
458 
434 

779 700 411 

780 1052 1136 

784 1413 

785 1364 

786 1622 
893 
616 
451 
777 

793 1536 1543 
807 
546 

797 1126 212 

798 033 

799 1420 

800 1759 
624 
898 



775 969 

776 1438 

777 1 539 

778 650 



787 
790 
791 
792 



794 1461 
796 388 



801 
802 
803 
804 
805 
806 
807 
808 
809 
610 
811 
812 
813 



1775 1468 
573 196 
203 
980 

902 
625 



1039 
306 
827 
1851 1015 
440 573 



4.2 
•5.9 
>0.0 
-20.6 
•11.1 
>0.0 
•7.1 
>0.0 
•15.0 
-7.0 
-3.7 
-12.8 
•5.5 
-4.2 
-15.1 
-19.1 
-11.1 
-6.0 
-6.7 
-0.9 
-14.3 
-22.0 
-29.8 
•16.9 
-4.2 
-5.1 
-33.6 
-9.8 
-13.5 
-5.9 
-1.6 
-21.7 
-14.2 
-1.4 
•24.0 
494 <-35.0 
12.5 



529 
685 
635 
392 
882 
1429 
377 



437 
593 
279 
865 
.547 



1358 
851 



249 
393 



745 1246 



814 2028 810 



815 


1086 


645 


816 


629 


313 


817 


1376 


1177 


818 


1771 


790 


819 


1045 


263 


820 


984 


362 


621 


1712 


279 


822 


1256 


205 


623 


1517 


654 


824 


1442 


449 


825 


1240 


513 


626 


1309 


1014 


827 


2012 


708 


828 


937 


1405 


630 


1342 


756 


631 


562 


626 


832 


1073 


1039 


833 


481 


620 


634 


501 


581 


837 


751 


748 


838 


635 


833 


639 


1494 


459 


640 


1952 


301 


641 


1585 


1060 


842 


571 


1312 


643 


1325 


649 


644 


1727 


301 


845 


630 


679 


646 


2016 


905 


847 


673 


1200 



•14.1 
-21.7 
-0.7 
•30.9 
-6.8 
-15.1 
-17.8 
>0.0 
•10.4 
-21.6 
-6.5 
-1.4 
•11.2 
-12.4 
-2.2 
-6,1 
•4.4 
-5.5 
-6.3 
-7.4 
>0.0 
-13.4 
-7.0 
-24.5 
•10.7 
•28.5 
-27.8 
-17.6 
•21 J 
•4.7 
>0.0 
-3.6 
-24.1 
-7.2 
-2.0 
-21.5 
>0.0 
•19.9 



41.600 
27,300 
51.400 
56,300 
25,000 
59.900 
44.300 
46.500 
46.200 
31.500 
56.700 
37,600 
43,100 
61.000 
63.800 
66.800 
25.500 
54,400 
35.000 
37,100 
69.500 
35.100 
15.400 
72.000 
11.700 
38,300 
53,100 
133,700 
63.400 
49,600 
96,500 
35.800 
53,000 
14.200 
146.400 
57.400 
29,000 
67,200 
37.500 
29.900 
51.100 
109.700 
69.400 
21,600 
36,200 
46.500 
85.700 
24,000 
39,100 
103.100 
74,600 
96.700 
139.200 
46,000 
62.000 
55.800 
29.900 
43.100 
16.200 
40,700 
37.500 
29.000 
37,800 
50.500 
41,100 
37.200 
60,900 
69,300 
27,500 
19.400 
46,300 
69.200 
44.600 
34,200 
23,200 
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648 


1863 


271 


649 


1166 


523 


650 


1535 


1024 


851 


1035 


826 


852 


634 


542 


655 


499 


220 


656 


1063 


194 


857 


667 


890 


658 


1446 


639 


659 


706 


311 


860 


1070 


1066 


861 


472 


347 


862 


674 


480 


664 


1307 


499 


865 ' 


645 


867 


866 


627 


1004 


868 


665 


494 


869 


1807 


402 


670 


1323 


763 


671 


1228 


1031 


672 


1904 


346 


673 


S56 


647 


674 


1540 


756 


675 


1566 


777 



876 1196 

677 1076 

878 1161 

679 647 

880 1756 

681 1543 

863 1432 

664 922 

885 1103 

686 1501 



887 
888 
889 
890 
891 
892 
894 



796 
636 
951 
717 
1123 
891 
1245 



895 1962 
696 1322 



897 
898 
899 
900 
901 
903 
904 
905 
907 
908 
910 
911 



420 
662 
645 
624 

931 
799 
765 
775 



351 
720 
1111 
757 
594 
278 
690 
689 
414 
607 
1103 
634 
759 
546 
229 
413 
234 
346 
626 
570 
428 
243 
703 
1094 
229 
520 



828 
681 
1544 



913 1606 

914 1237 

916 1442 

917 1260 
919 
920 1133 



824 
1303 
1544 
301 
387 
668 
749 
367 

764 1 541 



921 
923 



1123 
629 



924 1131 

925 1441 

926 679 

927 1467 

928 1082 
829 1231 



931 
832 
933 
934 
936 



1609 
810 
965 
947 
865 



937 1421 



1123 
380 
242 
318 
874 
219 
1191 
775 
816 
670 
900 
£20 
462 
843 
1056 



-0.6 
-9.2 
-4.2 
-11.4 
-15.5 
-27.8 
•10.9 
-14.4 
-5.4 
-18.9 
-10.7 
•28.8 
-19.9 
-7.4 
-21.0 
. -15.6 
-19.5 
-1.0 
•7.2 
-6.4 
-0.3 
-24.8 
•4.2 
-3.8 
-8.8 
-10.6 
-9.3 
-20.9 
-1.6 
•4.1 
-5.7 
-13.7 
-10.1 
-4.6 
-16.3 
-21.3 
-13.1 
•18.6 
-9.8 
-14.3 
•6.2 
>0.0 
-7.2 
-31.4 
-20.3 
-15.3 
•21.7 
•13.5 
-16.3 
-17.2 
•17.0 
-14.4 
-15.6 
•19.7 
-4.1 
-3.3 
-6.3 
-5.5 
-6.0 
-17.3 
-9.7 
-9.8 
-15.6 
-9.7 
-5.5 
-19.7 
-4.6 
-10.5 
-6.4 
-3.3 
-16.0 
•12.6 
•13.2 
•14.8 
•5.9 



99.500 
54.900 
29.600 
37.500 
53.400 
127.100 
150.500 
34.800 
46,900 
86.200 
28.000 
77,600 
56.800 
57,000 
34.900 
30,300 
57,400 
66,000 
39.400 * 
29.300 
77,700 
46,400 
40.700 
39.700 
76.800 
42.500 
26.400 
40.700 
49,700 
97,100 
34.800 
44,100 
66,400 
46.900 
26.600 
47.200 
40.600 
52.900 
121.200 
66,400 
117,800 
77.700 
47.700 
51,300 
64,500 
113,000 
43,400 
27,000 
121,000 
55,200 
34,600 
37.600 
19,700 
11,700 
89,100 
70.400 
44.100 
41,100 
73.700 
11,700 
25,900 
71,500 
113,200 
64,300 
35.400 
128,200 
23,500 
39,800 
38,000 
45,100 
34.400 
55.100 
60.600 
36.600 
28.400 



939 


1197 


827 


-83 


941 


1765 


.865 


-13 


942 


602 


472 


-22.7 


943 


312 


496 




944 


993 


491 


-12.1 


945 


1300 


269 


•73 


946 


630 


423 


-21.6 


947 


187 


736 


<-35.0 


948 


1380 


344 




949 


1766 


665 


-1.5 


950 


1038 


193 


-11.3 


951 


660 


152 


•14.9 


952 


957 


701 


-13.0 


954 


503 


547 


•27.6 


955 


1938 


712 


>0.0 


957 


1010 


616 


-11.8 



960 
961 
962 
963 
964 
965 



768 
596 
557 
887 
564 
969 
671 



966 1204 

967 910 



968 
969 1285 



174 
419 
409 

320 
334 
1155 
255 
798 
154 

609 1048 



970 
971 
972 
974 
975 



822 
976 
403 
279 
844 



976 1124 

977 994 

978 1612 
979 



206 
232 
437 
567 
495 
981 
295 
664 
642 
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37300 
35,000 
50.600 
57,100 
57.700 
100.300 
65.100 
41.600 
78.200 
45.400 
151.000 
213.000 
43.400 
53.000 
42.900 
37.900 
174,900 
65.700 
67.100 
83.900 
80.500 
24.800 
106.600 
38.700 
210.300 
28.700 
138.900 
119,300 
63,400 
51.600 
57.400 
31.200 
91.100 
45.400 
46.700 
25,300 
46.700 
33.900 
12.600 
84,700 
26.600 
24.600 
52.400 
74.900 
84.500 
33.300 
43.400 
38.200 
60.700 
36.600 
50.700 
56.500 
93,100 
92.700 
40.000 
56.900 
23,700 
58.100 
96,400 
46.600 
41,200 
53.500 
45,600 
25.800 
47.200 
30,700 
25,500 
65.000 
41,300 
22.500 
58,400 
591.300 
84,600 
62.400 
41.500 



960 1064 

961 1197 



749 1141 
642 



911 

963 1762 1506 

984 1344 

985 1024 



987 
968 
990 
991 
992 



317 
1105 



785 
1159 
1090 
1030 
847 
902 

996 688 

997 1815 
996 1205 
999 617 



739 1159 
816 555 



994 

995 



1000 
1001 



968 
970 



1002 1736 
1003 
10O6 



361 
317 
928 
701 
811 
461 
847 
579 
504 
289 
290 
771 
478 



1007 
1009 



643 1184 
822 487 



875 
291 



1010 1386 

1011 459 

1012 679 

1013 1816 

1014 1032 

1015 1629 

1016 1311 

1017 1722 

1018 1015 
1020 
1021 

1022 1129 

1023 812 

1024 785 

1025 1290 



279 
644 
745 
541 
661 
1128 
634 
994 
1134 
424 
743 

1574 1219 
781 484 



63 
317 
446 

739 



•17.2 
•23.0 
•24.8 
-14.4 
•24.5 
-12.8 
-20.0 
-8.7 
-13.9 
•22.3 
-7.7 
-15.8 
•12.6 
-32.6 
<-35.0 
-15.3 
-9.8 
-12.1 
•3.2 
•17.7 
-10.8 
-6.8 
-1.6 
-6.9 
-11.5 
-17.9 
-15.9 
-16.7 
-9.3 
-10.4 
-11.5 
-15.2 
-14.1 
-14.4 
-09 
-8.7 
-22.0 
-12.8 
-12.7 
-1.9 
•21.1 
•15.8 
-14.6 
*-35.0 
-6.4 
•29.4 
-19.7 
-0.9 
•11.4 
-3.0 
-7.4 
-2.0 
•11.7 
-3.7 
•16.8 
-9.7 
-15.9 
-16.7 
-7.7 
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1026 
1027 
1028 



405 

1296 
856 



1030 12S4 

1031 986 

1032 1547 



552 
648 
547 
226 
622 
403 



1033 


1381 


551 


1034 


1525 


496 


1035 


1128 


645 


1036 


1226 


274 


1030 


1761 


262 


1040 


541 


639 


1041 


818 


910 


1044 


1036 


485 


1045 


1439 


407 


1047 


1540 


250 


1048 


1576 


635 


1046 


1089 


411 


1050 


949 


1040 


1051 


426 


818 


1052 


1583 


1385 


1053 


779 


1092 


1054 


1613 


620 


1055 


1380 


377 


1056 


284 


663 


1056 


1261 


746 


1060 


393 


605 


1061 


1817 


645 


1062 


1245 


746 


1064 


1258 


792 


1065 


705 


934 


1066 


1181 


734 


1067 


529 


656 


1066. 


508 


696 


1069 


1898 


604 


1071 


873 


609 


1073 


1768 


1128 


1075 


836 


773 


1076 


1863 


861 


1078 


826 


566 


1081 


971 


463 


1083 


1697 


202 


1085 


1157 


794 


1090 


620 


910 


1092 


1867 


567 


1093 


2019 


894 


1094 


1546 


538 


1095 


1545 


477 


1098 


61 


935 


1099 


1954 


237 


1101 


586 


1048 


1102 


1050 


667 


1103 


457 


797 



1105 1864 

1106 1714 



1107 1717 
1106 1976 



1111 

1112 1348 



532 
649 
546 

722 

547 1066 



1115 1385 

1116 1078 

1117 975 

1118 1202 



621 
762 
816 
787 
933 

1119 1022 1076 

1120 1905 616 

1121 1512 1301 

1122 1114 677 

1123 1464 452 

1125 1048 857 

1126 1122 802 
1128 1722 892 
1133 1098 825 
1139 1830 569 

1147 764 1162 

1148 1968 724 



-323 
-7.5 
-15.0 
-7.7 
-12.3 
-4.1 
-6.4 
-4.3 
-9.7 
-€.5 
-1.6 
-25.7 
-15.6 
-11.3 
-5.5 
-4.2 
-3.7 
-10.4 
-13.2 
-31.1 
-3.6 
•16.8 
-3.2 
-6.5 
<-35.0 
-6.0 
-33.3 
-0.9 
-8.2 
-8.1 
-18.9 
-9.0 
•26.3 
-27.4 
-0.3 
-14.7 
-1.5 
-15.4 
-0.6 
-15.7 
-12.7 
-2.3 
-9.4 
-21.9 
-0.5 
>0.0 
-4.1 
-4.1 
<-35.0 
>0.0 
-23.3 
-11.1 
-29.5 
-0.4 
-2.1 
-2.1 
>0.0 
-25.3 
-6.9 
-6.4 
-10.6 
-12.6 
^.7 
-11.6 
•OJ 
-4.5 
-6.9 
-5.1 
-11.1 
•8.8 
•2.1 
-10.2 
-0.8 
-17.3 
>0.0 



52.600 
36.500 
53,000 
123.200 
37.700 
67.900 
52.700 
57,200 
46,500 
S6.300 
1C3.600 
36,900 
34.000 
58.300 
67,300 
109.200 
47.100 
66,700 
26.900 
37,800 
16,900 
27,000 
48,000 
72.000 
45.500 
41,200 
49.000 
46,600 
41.200 
39,000 
33.000 
41.800 
45.800 
43,700 
49.100 
48.700 
25.800 
39,900 
36.000 
51,600 
58.500 
142.300 
38,900 
34.000 
49.500 
34.600 
53.700 
59,100 
33.000 
116.000 
28,600 
45,200 
38,800 
54.200 
46,300 
53,100 
42,400 
28.000 
48.000 
40,400 
38,000 
39.300 
33,100 
27,600 
48.300 
19,700 
44.700 
61.700 
36,200 
38.600 
34.700 
37.500 
51,400 
23.800 
42.300 



1153 921 1158 -13.7 24,700 

1154 1594 664 -3.5 35 900 

1161 637 400 -21.3 66400 

1162 623 397 -21.8 68 800 

1163 665 397 -20.2 68700 
1168 564 528 .24.4 54 500 

1170 552 529 -25.0 54*500 

1171 538 524 -25.9 54 800 

1172 545 514 -25.5 55 700 
1174 1099 522 -10.2 55*000 

1176 1304 586 -7.5 50*200 

1177 1366 539 -6.6 53*700 

1178 1608 702 -3.3 43*400 

1179 1485 224 -4.8 124*900 

1180 1459 224 -5.2 124*900 

1181 1431 223 -5.7 125*100 

1182 1407 223 ^.1 125*200 

1183 1383 224 -6.4 1 24 700 

1184 1454 182 -5.3 1 64*400 

1185 1422 163 . £8 162*600 

1186 1394 182 -6.3 1 64 300 

1189 1171 214 .Q.2 131*800 

1190 1457 286 -5.2 94*200 

1191 686 1114 -19.5 26*200 
11S2 265 893 <-35.0 34*700 
1193 403 1292 -32.6 20 000 
11W 344 1275 <-35.0 20*600 

1195 505 1311 .27.6 19 400 

1196 572 1293 -24.1 20*000 

1197 639 1502 -21.2 13 000 

1198 637 1402 -21.3 1 6 *300 

1199 614 1407 -22.1 16*200 

1200 637 1431 -21.3 15 400 

1201 1095 1394 -10.3 16*600 

1202 1719 1545 -2.1 11*600 

1203 791. 668 -16.5 45*200 

1204 964 1021 -12.9 29*700 

1205 313 195 <-35.0 148*700 

1 208 306 1 94 <-35.0 1 4G.B00 

1209 320 197 <-35.0 147 400 

1210 326 197 <-35.0 146*600 

1211 394 294 -33.2 91 400 

1212 402 294 -32.7 91*200 

1214 386 294 -33.7 91 400 

1215 641 329 -21.2 81*600 

1216 660 329 -20.4 81*600 

1217 914 266 -13.B 101*800 

1218 873 245 -14.7 112000 

1219 970 372 -12.7 72*900 

1220 1021 298 -11.6 90 100 

1221 1392 205 -6.3 139*500 

1222 1354 203 -6.8 141,800 

1223 1362 205 -6.7 139.500 

1224 673 540 -19.9 53 600 

1225 614 542 -22.1 53*400 

1 226 603 539 -22.6 53 600 

1227 696 623 -19.2 47*800 

1228 707 628 -18.9 47 500 

1229 475 447 -28.7 62*300 

1230 466 1282 -29.0 20,400 

1231 759 1461 -17.4 14 400 

1232 1324 1170 -7.2 24 200 

1233 1583 1005 -3.6 30*300 

1234 1865 609 -0.6 38*200 

1235 1812 817 -1.0 37 900 

1236 1411 703 -6.0 43*400 

1237 1392 682 -6.3 44*500 

1238 794 410 .16.4 66 900 

1239 769 407 -17.1 67 3O0 

1240 740 406 -17.9 67*500 

1241 743 511 -17.8 55.900 

1242 713 510 -18.7 56 000 

1243 682 509 -19.6 56*100 

1244 663 504 -20.3 56 500 

1245 565 582 -24.4 50*500 



12* &47 577 -25.3 50.800 

1247 530 576. -264 50900 

1249 516 572 -27.0 5lS» 

1250 973 536 -12.7 53 900 

1251 607 532 -22.4 54^00 

1252 665 529 -20.2 54 400 

1253 899 766 -14.1 40*200 

1254 1311 746 -7.4 41.200 

1255 1300 761 -7.5 40 400 

1257 1938 712 0.0 42*900 

1258 1806 718 -1.0 42*600 

1259 1727 715 -2.0 42^700 

1260 1629 713 -3.0 42 800 

1261 1555 717 -4.0 42600 

1262 1468 717 -5.0 42,600 

1263 1413 722 -6.0 42 400 

1264 1340 717 -7.0 42*600 

1265 1263 717 -8.0 42 600 

1266 11 82 720 -9.0 42*500 

1267 1110 717 -10.0 42 600 

1268 1055 717 -11.0 42.600 

1269 999 717 -12.0 42*600 

1270 959 715 -13.0 42700 

1271 905 712 -14.0 42 900 

1272 857 714 -15.0 42800 

1273 810 705 -16.0 43*300 

1274 774 711 -17.0 42 900 

1277 737 708 -18.0 43*100 

1278 702 711 -19.0 42 900 

1279 671 710 -20.0 43 000 

1280 645 710 -21.0 43*000 

1281 617 707 -22.0 43*100 

1282 595 704 -23.0 43*300 

1283 573 700 -24.0 43*500 
1264 552 695 -25.0 43*700 

1285 536 694 -26.0 43*800 

1286 515 687 -27.0 44*200 

1287 496 683 -28.0 44*400 

1288 467 669 -29.0 45*200 

1289 447 667 -30.9 45 300 

1290 427 655 -31.0 45*900 

1291 412 655 -32.0 45*900 

1292 397 652 -33.0 46*100 

1293 381 654 -34.0 46*000 

1294 365 653 -35.0 46*100 

1295 348 653 <-35.0 46*100 
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.Table a.Xompuicd pVYof two sets atcarbamyUied protein standards: Rabbit muscle CP* and human 
hemoglobin (Hb) 



m . „ PIR *ASP #GLU #HIS #LYS #ARG NH2- Calc 

Protein Name Name 3.9 4.1 6.0 10.8 1Z5 7.0 pi 



Real 
CPK 



0 
-1 
-2 
-3 
-4 
-5 
-6 
-7 
-8 
-9 
-10 
-11 
-12 
-13 
-14 
-15 
-16 
-17 
-18 
-19 
-20 
-21 
-22 
-23 
-24 
-25 
-26 
-27 
-28 
•29 
-30 
-31 
-32 
-33 
-34 
-35 



0 

-1 

-2 

-3 

-4 

-5 

-6 

-7. 

-8 • 

-9 
-10 
-11 
-12 



Rabbit muscle CPK KlfiBCM 



Hb-beta, human 



HBHU 



28 


27 


17 


34 


18 


28 


27 


17 


33 


18 


28 


27 


17 


32 


18 


28 


27 


17 


31 


18 


28 


27 


17 


30 


18 


28 


27 


17 


29 


18 


28 


27 


17 


28 


18 


28 


27 


17 


27 


18 


28 


27 


17 


26 


18 


28 


27 


17 


25 


18 


28 


27 


17 


24 


18 


28 


27 


17 


23 


18 


28 


27 


17 


22 


18 


28 


27 


17 


21 


18 


28 


27 


17 


20 


18 


28 


27 


17 


19 


18 


28 


27 


17 


18 


18 


28 


27 


17 


17 


18 


28 


27 


17 


16 


18 


28 


27 


17 


15 


18 


28 


27 


17 


14 


18 


28 


27 


17 


13 


18 


28 


27 


17 


12 


18 


28 


27 


17 


11 


18 


28 


27 


17 


in 


A O 
IO 


28 


27 


17 


0 
y 


18 


28 


27 


1 / 


0 
0 


18 


28 


27 


17 


7 


IB 


28 


27 


17 


0 


18 


28 


27 


17 


c 
3 


4 0 
10 


28 


27 


17 


A 
•1 


1 D 
10 


28 


27 


17 


O 


IO 


28 


27 


17 


c. 


1 O 


28 


27 


17 


1 

1 




28 


27 


17 


n 


IB 
10 


28 


27 


17 


n 

V 


IO 


7 


8 


9 


11 


3 


7 


8 


9 


10 


3 


7 


8 


9 


9 


3 


7 


8 


9 


8 


3 


7 


8 


9 


7 


3 


7 


8 


9 


6 


3 


7 


6 


9 


5 


3 


7 


8 


9 


4 


3 


7 


8 


9 


3 


3 


7 


8 


9 


2 


3 


7 


8 


9 


1 


3 


7 


8 


9 


0 


3 


7 


8 


9 


0 


3 



6.84 

6.67 

6.54 

6.42 

6.31 

6.21 

6.12 

6.03 

5.94 

5.85 

5.76 

5.67 

5.58 

5.48 

5.39 

5.29 

5.20 

5.12 

5.04 

4.96 

4.89 

4.83 

4.77 

4.71 

4.66 

4.61 

4.56 

4.52 

4.48 

4.44 

4.40 

4.36 

4.32 

4.29 

4.25 

A22 



7.18 
6.79 
6.53 
6.32 
6.13 
5.96 
5.78 
5.59 
5.37 
5.14 
4.91 
4.71 
4.54 



0.0 

-1 

2 
-3 
-4 

-5 
-6 
-7 
-8 
-9 
-10 
-11 
•12 
•13 
-14 
-15 
-16 
-17 
-18 
-19 

-20 

-21 

-22 

-23 

-24 

-25 

-26 

-27 

-28 

-29 

-30 

-31 

-32 

-33 

-34 

-35 



-1.8 
-3.2 
-5.3 
-72 
-10.0 
-12.3 
•15.5 
-18.0 
-21.0 
-25.5 
-27.2 
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Table 4. Computed p/s of some known proteins related to measured CPK pfs 







PIR 


*ASP #GLU #HIS #LYS #ARG 


Calc 


Real 




Protein Name 


Name 


3.9 


4.1 


6X 


10 J 


MJS 


Dl 


CPK 


0 


Creatine phospho kinase (CPK), rabbit muscle 


KIRBCM 


28 


27 


17 


34 


18 


6.84 


0.0 


1 


' Fatty acid-binding protein, rat hepatic 


FZRTL 


5 


13 


2 


16 


2 


7.83 


-3.0 


2 


b2-microglobufin, human 


MGHUB2 


7 


8 


4 


6 


5 


6.09 


•5.0 


3 


Cartamoyl-phosphate synthase, rat 


SYRTCA 


72 


96 


28 


95 


56 


5.97 


•5.5 


4 


Prealbumin ( serum albumin precursor), rat 


ABRTS 


32 


57 


15 


53 


27 


5.98 


-6.2 


5 


Serum albumin, rat 


ABRTS 


32 


57 


15 


53 


24 


5.71 


-9.0 


6 


Superxwid dismutase (Cu-Zn, SOD), rat 


A26810 


8 


11 


10 


9 


4 


5.91 


-9.2 


7 


Phospholipase C, phophoinositide-specific (?), rat 


A26807 


34 


42 


9 


49 


21 


5.92 


-9.2 


8 


Albumin, human 


ABHUS 


36 


61 


16 


60 


24 


5.70 


-11.9 


9 


Apo A-l lipoprotein, rat 


A24700 


18 


24 


6 


23 


12 


5.32 


-13.7 


10 


proApo A-l lipoprotein, human 


LPHUA1 


16 


30 


6 


21 


17 


5.35 


-14.3 


11 


NADPH cytochrome P-450 reductase, rat 


RDRT04 


41 


60 


21 


38 


36 


5.07 


-15.6 


12 


Retinot binding protein, human 


VAHU 


18 


10 


2 


10 


14 


5.04 


-16.9 


13 


Actin beta, rat 


ATRTC 


23 


26 


9 


19 


18 


5.06 


-17.2 


14 


Actin gamma, rat 


ATRTC 


20 


29 


9 


19 


18 


5.07 


-16.8 


15 


Apo A-l lipoprotein, human 


LPHUA1 


16 


30 


5 


21 


16 


5.10 


-17.5 


16 


Apo A-IV lipoprotein, human 


LPHUA4 


20 


49 


8 


28 


24 


4.88 


-19.7 


17 


Tubulin alpha, rat 


UBRTA 


27 


37 


13 


19 


21 


4.66 


-19.8 


18 


FlATPase beta, bovine 


PWBOB 


25 


36 


9 


22 


22 


4.80 


-21.0 


19 


Tubulin beta, pig 


UBPGB 


26 


36 


10 


15 


22 


4.49 


-22.5 


20 


Protein disulphide isomerase (PDI), rat hepatic 


ISRTSS 


43 


51 


11 


51 


9 


4.07 


-25.0 


21 


Cytochrome b5, rat 


CBRT5 


10 


15 


6 


10 


4 


4.59 


-26.0 


22 


Apo 0*11 lipoprotein, human 


LPHUC2 


4 


7 


0 


6 


1 


4.44 


•30.5 



Amino acid pi assumed in calulation: 



3.9 



4.1 6.0 10.8 12.5 





Indexed in: EtOSlfi 
Curr nt Contents, MEDLA^c 
— ISSN 0173-OEa! 

ELCTDN 12 (11) 763-996 (199/ 



An International J urnal 



. TWO-DIMENSIONAL GEL PROTEIN DATABASES 
: Editon J.E. Celis 



J. E.Celis ( H. Letters, \ , ; r, ; [ ■ 
H. H. Rasmussen, F^Madsen,*- V. 
B. Honore, B. Gesse^jC bejgaard,* 
EvOlsen, G. P. Rate, J. B;Lauridsen,\ 
B. Basse, A. H. Andersen, 

Wat bum, B. Brandstrup, A. Celis> v 
W. Puype, J. Van Damme and 
tl. VandekerckhoVe „y 



J. E. Celis, P. Madsen, f .. 602 

H. H. Rasmussen, H. Letters, 

B. Honor*, B. Gesser, K. Dejgaard, 

E. Olsen, N. Magnusson, J. Kill, 

A. Celis, J. B. La u rid sen, B. Basse, 

G. P. Rate, A. H. Andersen, 
E. Walbum, B. Brandstrup, 
P. S. Pedersen, N. J. Brandt, 
M. Puype, J. Van Damme and 
J. Vandekerckhove 

H. H. Rasmussen, J. Van Damme, 873 
M. Puype, B. Cesser, J. E. Celis and 

J. Vandekerckhove 



N. L Anderson and N. G. Anderson 

N. L. Anderson, R. Esquer-Blasco, 
J .-P. Hofmann and N. G. Anderson 

P. J. Wirth, L-di Luo, Y. Fujimoto, 
H. C. Bisgaard and A. D. Olson 

R. A. VanBogelen and 
F. C. Neidhardt 



883 
907 

931 

955 
995 



Editorial; . 

The master .two-dimensional gel database of human AMA cell proteins: Towards 
linking protein and genome sequence and mapping information (Update 1991) 



A comprehensive two-dimensional gel protein database of none urtu red unfractlo* 
nated normal human epidermal keratinocytes: Towards an integrated approach t 
the study of cell proliferation, differentiation and skin diseases 



Microsequencing of proteins recorded in human two-dimensional gel protein 
databases 



A two-dimensional gel database of human plasma proteins 

A two-dimensional gel database of rat liver proteins useful in gene regulation an& 
drug effects studies ^ 



The rat liver epithelial (RLE) cell protein database 

The gene-protein database of Escherichia colt: Edition 4 

Miscellaneous 



1 



F r submission of papers, see Instructions to Authors (last page of this issu ) 



tithmpiwuik 19*1. ;:.90%h?jo 

-N. Leigh Anderson - - 

Kicardo Esqoer-Blasco 
Jean-Paul Hofroaon 
Norman G. Anderson 

Larpe Scale Biology Corporation, 
Rockville,MD 



ROCKETT EXHIBIT L 

Docket No.: PF-G300-3 CON 
USSN: 09/745,506 



D«ubuc of m liver pro it ins 



907 



"A rw-dimensional gel database of rat Jjver proteins 
useful in gene regulation and drug effects studies 

A standard two-dimensional (2-D) protein map of Fischer 344 rat liver 
(F.-44MST3) is presented, with a tabular listing of more than 1200 protein species 
Sodium oodecyl sulfate (SDS) molecular mass and isoelectric point have been es- 
tablished, based on positions of numerous internal standards. This map has been 
used to connect and compare hundreds of 2-D gels of rat liver samples from a va- 
riety of studies, and forms the nucleus of an expanding database describing rat 
liver proteins and their regulation by various dnigs and toxic agents. An example 
of such a study, involving regulation of cholesterol synthesis bv cholesterol-lower- 
mg orugs and a hich-cbolesterol diet, is presented. Since the map has been ob- 
tained with c w,oely used and highly reproducible 2-D gel system (the Iso-DaJt ? 
system), it can be directly related to an expanding body of work in other laborato- 
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1 Introduction 

High-resoJution two-dimensional electrophoresis of pro- 
teins, introduced in 1 975 by OTarrell and others [ 1-4], has 
been used over the ensuing 16 years to examine a wide va- 
riety of biological systems, the results appearing in more 
than 5000 published papers. With the advent of computer- 
ized systems for analyzing two-dimensional (2-D) gel ima- 
ges and constructing spot databases, it is also possible to 
plan and assemble integrated bodies of information de- 
scribing the appearance and regulation of thousands of pro- 
tein gene products [5, 6]. Creating such databases involves 
amassing and organizing quantitative data from thousands 
of 2-D gels, and requires a substantial commitment in tech- 
nology and resources. 

Given the long-term effort required to develop a protein da- 
tabase, the choice of a biological system takes on consider- 
able importance. While in vitro systems are ideal for answer- 
ing many experimental questions, especially in cancer re- 
search and genetics, our experience with cell cultures and 
tissue samples suggests that some in vivo approaches could 
have major advantages. In particular, we have noticed that 
liver tissue samples from rats and mice appeano show grea- 
ter quantitative reproducibility (in terms of individual pro- 
tein expression) than replicate cell cultures. This is perhaps 
a natural result of the homeostasis maintained in a com- 
plete animal vs. the well-known variability of cell cultures 
the latter due principally 10 differences in reagents (e.g.' 
fetal bovine serum), conditions (e.g., pH) and genetic^evo- 
lution'of cell lines while in culture. It is also more difficult 
to generate adequate amounts of protein from cell culture 
systems (particularly with attached cells), forcing the inves- 
tigator to resort to radioisotope-based or silver-based stain- 
detection methods. While these methods are more sensi- 
tive (sometimes much more sensitive) than the Coomassie 
Brilliant Blue (CBB) stain typically used for protein detec- 
tion in"large"protein samples, they are generally more vari- 
able, more labor-intensive and, in the case of radiographic 
methods, may generate highly u noisy M images, due to the 
properties of the films used. By contrast, large protein sam- 
ples can easily be prepared from liver using urea/Nonidet 
P-40 (NP-40) solubilization and stained with CBB, which 
has the advantage of being easily reproducible [8]. Finally, 
there remains the question of the "truthfulness" of many in 
vitro systems as compared to their in vivo analogs; how 
great are the changes caused by the introduction into a cul- 
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- -tureand tbenassoriated shift to strong selectiorrfor growth, 
and how do these affect experimental outcomes? Hence 
the apparent advantages of in vino systems, in terms of ex- 
perimental manipulation, may be counterbalanced by 
other factors relating to 2-D data quality. 

There is a second important class of reasons for exploring 
the use of an in vivo biological system such as the liver. His- 
torically, there have been two broad approaches to the me- 
chanistic dissection of biochemical processes in intact cel- 
lular systems: genetics (a search for informative mutants) 
and the use of chemical agents {drugs and chemical toxins). 
Both approaches help us to understand complex systems 
by disrupting some specific functional element and show- 
ing us the result. With the development of techniques for 
genetic manipulation and cloning, the genetic approach 
can be effectively applied either in vitro or in vivo, although 
the in vitro route is usually quicker. The chemical approach 
can also be applied to either sort of biological system; here, 
however, the bulk of consistently acquired information is 
in experimental animals (rats and mice). While most biolo- 
gists know a short list of compounds having specific, experi- 
mentally useful efTects {e.g., inhibitors of protein synthesis, 
ionophores, polymerase inhibitors, channel blockers, nu- 
cleotide analogs, and compounds affecting polymerization 
of cytoskeletal proteins), there is a much larger number of 
interesting chemically-induced effects, most of them char- 
acterized by toxicologists and pharmacologists in rodent 
systems. Just as a thorough genetic analysis would involve 
saturating a genome with mutations, it is possible to ima- 
gine a saturating number of drugs, the analysis of whose ac- 
tions would reveal the complete biochemistry of the cell. 
While organized drug discovery efforts usually target spe- 
cific desired effects, the nature of the process, with its de- 
pendence on screening large numbers of compounds, ne- 
cessarily produces many unanticipated effects. It is there- 
fore reasonable to suppose that the required broad range of 
compounds necessary to achieve "biochemical saturation" 
may be forthcoming; in fact, it may already exist among the 
hundreds of thousands of compounds that failed to qualify 
as drugs. 

Among organs, the liver is an obvious choice for the study 
of chemical effects because of its well-known plasticity and 
responsiveness. The brain appears to be quite plastic (e.g. 
[7]), but it is a complicated mixture of cell types requiring 
skillful dissection for most experiments. The kidney, while 
quite responsive, also presents a potentially confounding 
mixture of cell types. The liver, by contrast, is made up of 
one predominant cell type which is easy to solubilize: the 
hepatocyte, representing more than 95% of its mass. Most 
importantly, the liver performs many homeostatic func- 
tions that require rapid modulation of gene expression. It 
appears that most chemical agents tested affect gene ex- 
pression in the liver at some dosage (N. Leigh Anderson, 
unpublished observations), an interesting contrast to our 
earlier work with lymphocytes, for example, which seem to 
be much less responsive. Such results conform to the expec- 
tation that cells with a homeostatic, physiological role 
should be more plastic than cells differentiated for a pur- 
pose dependent on the action of a limited number of spe- 
cific genes. 

The liver also allows the parallels between in vitro and in 
vivo systems to be examined in detail. Significant progress 



' has been matfe in the development of mouse, rat and hu- 
man hepatocyte culture systemsTas well as in precision-cut 
tissue slices. Using such an array of techniques, it is p0 s$i. 
ble to assemble a matrix of mammalian systems includim 
mouse and rat in vivo on one level and mouse, rat and hu- 
man in vitro on a second level, and to compare effects be- 
tween species and between systems. This approach allows 
us to craw informed conclusions regarding the biochemical 
Universality" of biological responses among the mammals 
and to offer some insight into the validity of in vitro ap- 
proaches for toxicoiogical screening. We believe this data 
will be necessary if in vitro alternatives are to achieve wide 
usage in government-mandated safety testing of drugs, con- 
sumer products and industrial and agricultural chemicals. 

A number of interesting studies have been published using 
2-D mapping to examine effects in the rodent liver. A num- 
ber of investigarors have made use of the technique to 
screen for existing genetic variants [8-1 1] or induced muta- 
tions [12-14], mainly in the mouse. This work builds on the 
wealth of genetic information available on the mouse and 
us established position as a mammalian mutation-detec- 
tion system. While some studies of chemical effects have 
been undenaken in the mouse [15-17], most have used the 
rat [l£-23). The examination of the cytochrome p-450 sys- 
tem, in particular, has been carried out almost exclusively 
on the rat [24, 25]. 

These considerations lead us to conclude that rodent liver 
offers the best opportunity to systematically examine an 
array of gene regulation systems, and ultimately to build a 
predictive model of large-scale mammalian gene control. 
The basic underlying foundation of such a project is a reli- 
able, reproducible master 2-D patiern of liver, to which on- 
going experimental results can be referred. In this paper, we 
report such a master pattern for the acidic and neutral pro- 
teins of rat liver (patiern F344MST3).ln fuiure.this master 
will be supplemented by maps of basic proteins,and analog- 
ous maps of mouse and human liver. 



2 Materials and methods 
2.1 Sample preparation 

Liver is an ideal sample material for most biochemical stud- 
ies, including 2-D analysis. A sample is taken of approxima- 
tely 0.5 g of tissue from the apical end of the left lobe of the 
liver Solubilization is efTected as rapidly as practical; a 
delay of 5-15 min appears to cause no major alteration' in 
liver protein composition if the liver pieces are kept cold 
(e.g., on ice) in the interim. In the solubilization process, 
the liver sample is weighed, placed in a glass homogenizer 
{e.g., 15 mL Wheaton); 8 volumes of solubilizing solution* 

The solubilizing solution is composed of2% NP-40 (Sigma),9 m urea 
(analytical grade, e.g.. BDH or Bio-Rad), 0.5% dithiothreitol (DTT; 
Sigma) and 2 % carrier ampholytes (pH 9- 1 1 LKfi: these come u a 20% 
stock solution. so 2 % final concentration isachieved by making the final 
solution 10% 9-11 Ampholine by volume). A large batch of solubilner 
(several hundred mL) is made and stored frozen at -80°C in aliquou 
sufficient to provide enough for one day's estimated sample prepara- 
tion requirement. The solution is never allowed to become warmer 
than room temperature at any stage during preparation or thawing for 
use, since heating of concentrated urea solutions can produce contami- 
nants thai covalently modify proteins producing a ni factual charge 
shifts. Once thawed, any unused soiubilizer is discarded. 
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is added (i.e., 4 ml per 0.5 g tissue) and the mixture is ho- 
— mogeaized-using first theyloose-antfrbtnihenihe tight-fit- 
ting g lass pestle. This takes approximately 5 strokes with 
each pestle and is earned out at room temperature be"cau«e 
urea would crystallize out in the cold. Once the liver sample 
is thoroughly homogenized in the solubili2er. it i* a«i>med 
that all tbe.proteins are denatured (by the chaotropic effect 
of ibe urea and NP-40 detergent) and the enrvmes inacti- 
vated by the high pH (-9 J). Therefore these samples may 
be kept at room temperature until they can be centrifuged 
or frozen as a group (within several hours of preparation) 
The samples are centrifuged for 6 X 10* g min (e.g., 500 000 
X g for 12 min using a Beckman TL-100 centrifuge) The 
centrifuge rotor is maintained at just below room tempera- 
ture (e.g., 15-20 C C), but not too cold, so as to prevent" the 
precipitation of urea. The centrifuge of choice is i Beckman 
TL-100 because of the sample tube sizes available, but anv 
ultracentnfuge accepting smallish tubes wjj] suffice When 
an appropriate centrifuge is not available near the site of 
sample preparation, samples can be frozen at -80'C and 
thawed prior to centrifugaiion and collection of superna- 
tarns. Each supernatant is carefully removed following cen- 
tnfugation and aliquoied into at least 4 clean tubes forstor- 
age.This is done by transferring all the supernatant to one 
clean tube, mixing this gently (to assure homogeneous 
composition) and then dividing it into 4 aljquots The ali- 
quots are frozen immediaiely at -80°C. These multiple ali- 
quots can provide insurance against a failed run or a freezer 
breakdown. 
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directly (ajxtruded) onto the slab eels without ,„„vk 
«ion. and held in place by p&S^SS^SS^S 

of 20 in cooled DALT tanks ('^"SSSS&ffl 
All run parameters, reagent source and lot infonnaSw 

« ?£ • eV,aU ° D fr0m ex P ected resul " ■» en£ 
red bj the leennicun responsible on a detailed, multi-paee 
record of the experiment. p * 

2J Staining 



22 Two-dimensional electrophoresis 

Sample proteins are resolved by 2-D electrophone-^ mine 
the 20 X 25 cm lso-Dalt« 2-D gel system ([26-29]- pro- 
duced by LSB and by Hoefer Scientific Instruments San 
Francisco) operating with 20 gels per batch. All first-dimen- 
sional isoelectric focusing (1EF) gels axe prepared using the 
same single standardized batch of carrier amphoJvies 
(BDH 4-8A in the present case, selecied by LSB's ba'tch- 
testing program for rat and mouse database work") A 10 
uL sample of solubilized liver protein is applied to each eel 
and the gels are run for 33 000 to 34 500 volt-hours using a' 
progressively increasing voltage protocol implemented by 
a programmable high-voltage power supply. An Ange- 
lique'* computer-controlled gradient-casting system (pro- 
duced by LSB) is used to prepare second-dimensional sod- 
mm dodecyl sulfate (SDS) polyacrylamide gradient slab 
gels in which the top 5% of the gel is 1 1 %T acrvlamide, and 
the lower 95 % of the gel varies linearly from 1 1 % to 18 %T. 

This system bas recently been modified so as to employ a 
commercially available 30.8%T aciylajnide/A^-methyle- 
nebisacrylarnide prepared solution (thus avoiding the han- 
dling of the solid acrylamide monomer) and three addi- 
tional stock solutions: buffer (made from Sigma pre-set 
Tris), persulfate and AW/W-tetramethylethvlenedi- 
amine (TEMED). Each gel is identified by a computer- 
printed filterpaperlabel polymerized into the lower left cor- 
ner of the gel. First-dimensional IEF tube gels are loaded 

" This material (succeeding certified baiehes ofwhieh are available from 
Hoefer Scientific Instruments) has the mo« linear pH gradient pro- 
duced by any ampholyte tested except for the Pharmacia wide range 
(which has an unacceptable tendency to bind high-molecular weight 
acidic proteins, causing them to streak). 



Following SDS-electrophoresis, slab gels are stained for 
protein using a colloidal Coomassie Blue G-250 procedure 
JmvTi 8 P n StiC b v oxe iL W|,h 10 ««»« (""ailing approxima- 
V L u 0f i!il per b0X - ^ Procedure (based on the work 

^ .nd h 5l P ! ,31 S inV ° 1VeS futati0n in of 50% etna, 
nol and 2 % phosphoric acid for 2 h, three 30 min washes, 
each in 2 L of cold tap water, and transfer to 1.5 L of 34% 
rnethano 17% ammonium sulfate and 2 % phosphoric acid 
Tor 1 h,fo lowed by the addition ofa gram of powdered Coo- 
mass.e Blue G-250 stain. Staining requires approximately 4 
days to reach equilibrium intensity, whereupon Tels are 
^nsferred to coo tap waterand their surface, ; rinsed to re- 
move any paniculate sum prior to scanning. Gels may be 
kept for several months in water with added sodium azide 
The water washes remove ethanol that would dissolve the 
c™" £ ader thC Sy " em nor »colloidal, with high back- 
grounds). The concentrated ammonium sulfate and meth- 
anol solution is diluted by equilibration with the water vol- 
ume of the gels to automatically achieve the correct final 
concentrations for colloidal staining. Practical advanuges 

£ lnl l T n i aP , Pr ° aCh Can be sum ™'i"d as follows: (i) 
the low, flat background makes computer evaluation of 
smal spots (max OD < 0.02) possible, especiall, when 
us ng laser dens.tometry; (ii) up to 1500 spots can be reli- 
ably detected on many gels (e.g., rat liver) at loadings tow 
enough to preserve excellent resolution; and (iii) reprodu- 
c.b.lity appears to be very good: at least several hundred 
spots have coefficients of reproducibilitv less than 15% 
Jl a,ue „ ls " least as previous CBB methods.and 

s lg n.ficantly better than many silver stain systems. 

2.4 Positional standardization 

5pKT^n d > a 'rH e c d r- ra , bbil mUS u e Cr " Une Pho^Phokinase 
rhh^ S J J2] arc P urchased from Pharmacia and 

a UH. Ammo acid compositions, and numbers of residues 
present ,n proteins used for internal standardization, are 
taken from the Protein Identification Resource (PIR1 se . 
quence database [33]. ' 



2.5 Computer analysis 

Stained slab gels are digitized in red light at 134 micron re- 
solution, using either a Molecular Dynamics laser scanner 
(with pixel sampling) or an Eikonix 78/99 CCD scanner 
Raw digitized gel images are archived on high-density DAT 
tape (or equivalent storage media) and a greyscale video- 
print prepared from the raw digital image as hard-copy 
backup of the gel image. Gels are processed using the Ken. 
Ier» software system (produced by LSB), a commercially 
available workstation-based software package built on 



9l6 * LAn*i»o«rf«/. 
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some of the principles of the earlier TYCHO system [34- 
4]). Procedure PROC00S is used to yield a spotlist giving 
position, shape and density information foi each detecied 
spot. This procedure makes use of digital filtering, mathe- 
matical morphology techniques and digital making to re- 
move the background, and uses full 2-D least-squares opti- 
mization to refine the parameters of a 2-D Gaussian shape 
for each spot. Processing parameters and file locations are 
stored in a relational database, while various log files detail- 
ing operation of the automatic analysis software are ar- 
chived with the reduced cata.The computed resolution and 
level of Gaussian convergence of each gel are inspected 
and archived for quality control purposes. 

Experiment packages are constructed using the Kepler ex- 
periment definition database to assemble groups of 2-D 
patterns corresponding to the experimental groups {e.g., 
treated and control animals). Each 2-D pattern is matched 
to the appropriate '■'master" 2-D pattern (pattern 
F344MST3 in the case of Fischer 344 rat liver), thereby 
providing linkage to the existing rodent protein 2-D data- 
bases. The software allows experiments containing hun- 
dreds of gels to be constructed and analyzed as a unit, with 
up to 100 gels displayed on the screen at one time for com- 
parative purposes and multiple pages to accommodate ex- 
periments of > 1000 gels. For each treatment, proteins 
showing significant quantitative differences vs. appropriate 
controls are selected using group-wise statistical parame- 
ters (e.g., Student's t-test, Kepler* procedure STUDENT). 
Proteins satisfying various quantitative criteria (such as P< 
0.O01 difference from appropriate controls) are repre- 
sented as highlighted spots onscreen or on computer-plot- 
ted protein maps and stored as spot populations (i.e., logi- 
cal vectors) in a liver protein database. Quantitative data 
(spot parameters, statistical or other computed values) are 
stored as real-valued vectors in the database. Analysis of co- 
regulation is performed using a Pierson product-moment 
correlation (Kepler procedure CORREL) to determine 
whether groups of proteins are coordinate!)' regulated by 
any of the treatments. Such groups can be presented graphi- 
cally on a protein map, and reported together with the statis- 
tical criteria used to assess the level of coregulation. Multi- 
variate statistical analysis {e.g., principal components' ana- 
lysis) is performed on data exported to SAS (SAS Institute). 

2.6 Graphical data output 

Graphical results are prepared in GKS and translated 
within Kepler* into output for any of a variety of devices. 
Linedrawjng output is typically prepared as Postscript and 
printed on an Apple LaserWriter. Detailed maps presented 
here have been generated using an ukra-high-resolution 
Postscript-compatible Linotronic output device. Greyscale 
graphics are reproduced from the workstation screen using 
a Seikosha videoprinter. Patterns are shown in the standard 
orientation, with high molecular mass at the top and acidic 
proteins to the left. 

2.7 Experiment LSBC04 

In the study described here 12-week-old Charles River 
male F344 rats were used. Diets were prepared at LSB, 
based on a Purina 5755M Basal Purified Diet. Lovastatin 
and cholestyramine were obtained as prescription pharma- 



ceuticals, ground and mixed with the diet at c ncentrations 
of 0.075% and 1%, respeciiveiynhe high cholesterol diet 
was Purina f S01M-A (5% cholesterol plus 1 % sodium cho- 
late in the control diet). Animal work was carried out by Mi* 
crobiological Associates (Betbesda,MD). Animals were ac- 
climatized for one week on the control diet, fed test or con- 
trol diets for one week, and sacrificed on day g. Average 
daily doses of lovastatin and cholestyramine in appropriate 
groups were 37 mg/kg/day and 5 g/kg/day, respectively, 
based on the weight of the food consumed. Liver samples 
were collected and prepared for2-D electrophoresis accord- 
ing to the standard liver protocol (homogeni2ation in 8 
volumes of 9 m urea, 2% NP-40, 0.5% ditbiothreitol, 2% 
LKB pH 9—11 carrier ampholytes, followed by centrifuga- 
tion for 30 min at SO 000 X g). Kidney, brain and plasma 
samples were frozen. Gels were run as described above, 
and the data was analyzed using the Kepler* system. Gels 
were scaled, to remove the effect of differences in protein 
loading, by setting the summed abundances of a large num- 
ber of matched spots equal for each gel (linear scaling). 



3 Results and discussion 

3.1 The rat liver protein 2-D map 

F3-4MST3 is a standard 2-D pattern of rat liver proteins, 
based on the Fischer 344 strain. This pattern was initiated 
from a single 2-D gel and extensively edited in an experi- 
ment comparing it to a range of protein loads, so as to in- 
clude both small spots and well-resolved representations of 
high-abundance spots. More than 700 rat liver 2-D patterns 
have been matched to F344MST3 in a series of drug effects 
and protein characterization experiments, and numerous 
new spots (induced by specific drugs, for instance) have 
been added as a result. A modified version including addi- 
tional spots present in the Sprague-Dawley outbred rat has 
also been developed (data not shown). Figure 1 shows a 
greyscale representation and Fig. 2 a schematic plot of the 
master pattern. More than 1200 spots are included, most of 
which are visible on typical gels loaded with 10 uLof solubi- 
lized liver protein prepared by the standard method and 
stained with colloidal Coomassie Blue. Master spot num- 
bers (MSN's) have been assigned to all proteins, and ap- 
pear in the follow ing figures, each showing one quadrant of 
the pattern. Figure 3 shows the upper left (acidic, high 
molecular mass) quadrant. Fig. A the upper right (basic, 
high molecular mass) quadrant, Fig. 5 the lower left (acidic, 
low molecular mass) quadrant, and Fig. 6 the lower right 
(basic, low molecular mass) quadrant. The quadrants over- 
lap as an aid to moving between them. The gel position (in 
100 micron units), isoelectric point (relative to the CPK in- 
ternal p/standards) and SDS molecular mass (from the cali- 
bration curve in Fig. 8) are listed for each spot (Table 1). Be- 
cause of the precision of the CPK-p/ values, these parame- 
ters can be used to relate spot locations between gel sys- 
tems more reliably than using p/ measurements expressed 
as pH. A major objective of current studies is the identifica- 
tion of all major spots corresponding to known liver pro- 
teins, as well as rigorous definitions of subcellular orga- 
nelle contents. Of particular interest to us is the parallel de- 
velopment of identifications in the rat and mouse liver 
maps, allowing detailed comparisons of gene expression ef- 
fects in the two systems. The results of these studies will be 
presented systematically in a later edition of this database, 
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" ?f ba T' h,ed Cbar8e steEda ">s. computed p/s and 
molecular mass standardization P's-na 

We have previously shown that the use of - «•«,„- „r . 

lv-spaced interna] p/ markers foad t ^ 

basic protein) offers an accurate and work-bl An * 5 

^ Pro bl e m ofass^in g pcsi t ionstthe 0 pVdt r K 
The same system, based on 56 protein <pecie< 4- ^ ' ]- 
bamylating rabbit muscle CPK ha«betS ^ 0e fcycar - 
sjgn prs to most rat liver acidic' -Vh ■ C fcere 10 as " 
s:and P ards were -elec^or^c^ 
and the standard spots added to a <recH i vl )«■' Fr0I - e,ns - 
master pattern F344MST3. ^/eSeirtiSS oUll 
liver protein spots lying within the CPK char« tj-i- I 
then transformed into CPK pj positions * 
between the positions orimmeiVtS^ 
(Table 1) using a Kepler* vector procedure 

It has proven possible to compute fairlv accurate D / v-i.. 
for many proteins from the at>-L " !,e p/ values 

We have attempted here W ,ioa [42 ^ 

approach,* which we comp Jfflgj' bo p ra,1 ° n 
themselves, based on our knowledge of , bh f^" S 
CPK sequence and the fact that adjacent m^s a ; !cle 
charge train typically differ bv b ?oS S SS^,^ 
sine residue (Table 3). We c^ P ^"&™*?™™y' 
computed prs for an additional set of carbVmvV , H m " 
arts made from human hemoglobin bS JhS i Und ' 
nes of rat liver and human plJawntintf'ZZ"? * 
t.on and sequence (Fig. 7,Table 4).Th resuli de™*" P ° S " 
good concordance between these ? emons,raIes 
showsigni/icant deviations? liver fat tv-add b Jd'° P '° teins 
(FABP; #1 i„ Table 4) and pro ?n disultS 2 Pr ° lein 
(#20 in the table). The f FABP .»^pSfo?Fff4M^ 
may represent a charge-modified vS more 1 3 

lE^S^efor ^ ""' -X in the 

ith/SDS gel. Of particular importance is the fact ihT, k 

comparing computed P A of sequenced bu tu n lo Ca £ ' 
teins with the CPK d/*s we car, " n ' ocat ed pro- 
•ion without nJ^^p^^*" 1 ^ 
gel pH gradient. This offers a useful ^honcm .11 ,1""^ 
laries fpHm^a^ion^^^^^- 
have used this approach to compute tie CPK ir J ?,i 6 
and m use proteins in the PIR sequence database til "J 
to protein identification (data not shown) 



ycoo rdmate,cis511.S3.fti s IflTni • 
resulting fit appears to be fir^vf old o ' * 331838 °lThe 
molecular mass. } g00d overa bro »«J range of 

^^iZS^ h ^ "*-e test of 
agents included in the S- ^£?S a " v *° b - V th '« 
tor of HMG-CoA reductase? * : ( Mev »or*,an inbibi- 
seeuestram that has t *ff« ?f eSImB, - me (a bHe acid 
from the gut-liverrecircuLS^ l^r 1 ^ cbolest «ol 
firs, two agents shou d^^f*^^ """"-The 
third should raise it allowinV f e "? and ^ 

«tne expression con^ £^TO' a ;? a ° f re ' CVaDt 
an experiment offers an intern test of * e TnT ^ 
system since most of the pathway i™ tne 2-Dmapp,ng 
low abundance manv are m?™? enzym " 2re P r «ent in 
»»olubiIize,«nd^.^;S^- to,, ? d aEd dif *<ult 
teiy 1000 proteins w^eW. , ? " com P , «-Approxima- 
mogenates Tw i n P 2 d 2nd detecIed in '""ho- 
by .' leas o£ re *S B f S ^ f ° Und 10 be afr " ted 
several coregulated ™oups C ° UW lM0 

3 »f sefo/^"; 6 T" ,if H MG-C*A synthase) 
sets of spots regulated coordinate)? or inversely 



In order to standardize SDS molecular weight (SDS-MWi 
we have used a standard curve fitted to a S eri« ir^ r X 
proteins (Fig. 8). Rather than using %S£,'!£ t nuS °* 
we have elected to use the number of 2 0 aads Z£ 
polypeptide chain, as perhaps a bener todiLSn «? 5 e 
length of the SDS-coated rod that is "iLed bv ! 
dimension slab. The resulting values were ™ u »?f B t ?^ 

we tried many equat ons and selected »h» k-.. • 
program -Tablecurve-onaPCT^ee^ 

+ to+M.^S. the - u»bi3S25STO 



2J& U S.°<&^ «imed to the cyto- 

increase in ^ni^^f^^^^WM 
■he synergistic f^ in ZiT^ " Cho,est > ra ™ine, 
t.vramine.anda drama, ic decrea,, ^ -.5 "Tu"" and chol «- 
diet. Spot number™"^ 

tein in the presen expe imen^h r ° n8ly t re6U)a,ed P ro * 
d action after a 1 weeffe^l; « g - 3 5 " ,0 10 - fol <i »• 

1% cholestyramin^lne S ?,g ^ d^ 1^^ - 
sion follows precisely (hpp»nZ ? 10) - Ils ex Pres- 
abundance £c^^S^^. ^ ^ 
gressively increased fiom ft?L?wne^i^ihii " Pf °" 
mine, lovastatin and iovastatin dIu ch2« y cho,cst - vra - 
sinks below the ihreshoW of h., • eSlyramine ' and " 
h.gh cho.es.ero, die! Th s spof SSS" animah fed «* 
Hed as the cytosoi.c ^0^^^^^' idCBti - 
t'on with an antiserum to that Vn„ 2 - ° n 2 rcac * 
chael Greenspan at M«S Share T Snh Pr ° V i ded by ° r Mi " 
ratones. Th IS enzvme lies im™ V" ? Research Labo- 
r^ducase in thj ?S™ r ch^ e ™™ HMG-CoA 
is known to be co-reoulated £ , , { "' S P a,hw ay.and 
"O'ecu.^ weight ofVSSJi^ 

Qucnc. S fhfharSL 0 e f r^m°e m ,Si ed ^ kn ° Wn *" 

pSrtc^ ««t (Kepler 

found to be coregula e d wJ?4?3 ^e 3 ^ 111 , 0 ?' SP ° U Was 
was exceedingly hieh S 95 i rl r 1 ° f corre| ation 
are a, im^rA^^S^ 1 ?^^ 
charge more acidic than 4?3 (Fit to; a P prox,maie 'y one 
may be covalen.ly modffild fS of Z S' n8 1 ,hat th ^ 

CoAsynth^^rma^?^^^^^^^^^^^^^ 
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to comprise an additional reUtcd pair (1253 and 1001) of 
around 40 kDa and a single spoi (1119) of around 21 kDa. 
Because these two presumed proteins are present st sub- 
stantially lower abundances than 413,and because the cyto- 
solic HMG-CoA synthase is reported to consist of only one 
type of polypeptide, they are likely to represent other, verv 
lightly coregulated enzymes. A second group of six spots 
was selected based on a regulatory pattern close to the in- 
verse of that for spot 413 (MSN's 34, 79, 178, 182, 204.347; 
data not shown). For these proteins, the lowest level of ex- 
predion occurs with exposure to lovastaiin plus cholestyra- 
mine and the highest level upon exposure to the high-cho- 
lesterol diet. Spots 182 and 79 are highly correlated and lie 
about one charge apart at the same molecular weight; they 
may thus be isoforms of a single protein. The other four 
spots probably represent additional enzymes or subunits. 

3 32 MSN 235 and coregulated spots 

A third group of five spots, mainly comprised of mitochon- 
drial proteins including putative mitochondrial HMG- 
CoA synthase spots, showed a modest induction by lovasta- 
tin alone, but little or no elTect with any of the other treat- 
ments (including the combination of lovastaiin and choles- 
tyramine; Fig. 12).This result is intriguing because lovasta- 
iin was expected to affect only the regulation of enzymes of 
cholesterol synthesis, which is entirely extra-mitochon- 
dria!. Three of the spots (235, 134, 144) form a closely- 
packed triad at approximately 30 kDa, and are likely to re- 
present isoforms of one protein. All three spots are stained 
by an antibody to the mitochondrial form of HMG-CoA 
synthase obtained from Dr. Greenspan. Subcellular fractio- 
nation indicates a mitochondrial location. The other two 
spots (633 at about 38 kDa and 724 at about 69 kDa) are 
each present at lower abundance than the members of the 
triad. 

3 33 An example of an anti-synergistic effect 

A sixth spot (367) shows strong induction by lovastaiin 
(two- to threefold), and about half as much induction with 
lovastaiin plus cholestyramine, but without sharing the ani- 
mal-animal heterogeneity pattern of the 235-set (Fig. 13). 
This protein is also mitochondrial, and represents the clear- 
est example of an anti-synergistic effect of lovastaiin and 
cholestyramine. The existence of such an effect demon- 
strates that lovastaiin and cholestyramine do not act exclu- 
sively through the same regulatory pathway. 

33.4 Complexity of the cholesterol synthesis pathway 

Taken together, these results suggest that treatment with lo- 
vastaiin alone can afTect both cytosolic and mitochondrial 
pathways using HMG-CoA, while cholestyramine, on the 
other hand, either alone or in combination with lovastaiin, 
produces a strong effect n the putative cytosolic pathway, 
but little or no effect on the putative mitochondrial path- 
way. An explanation for this difference may lie in lovasta- 
tin's effect on levels of HMG-CoA and related precursor 
compounds that are exchanged between the cytosol and 
the mitochondrion, whereas cholestyramine should affect 
only the cyiosolic pathways directly controlled by cholester- 
ol and bile acid levels.lt remains to be explained why some 



proteins of the putative mitochondrial pathway are so 
much more variable in their expression in all groups. An ex- 
amination of all the coregulated groups suggests that quan- 
titative statistical techniques can extract a wealth of inter- 
esting information from large sets of reproducible gels. The 
abundance of spots in the 413 coregulation group, for exam- 
ple . shows an amazing level of concordance in their relative 
expression among the five individuals of the lovastaiin and 
cholestyramine treatment group. This effect is not due to 
differences in total protein loading. since they have already 
been removed by scaling, and since proteins with quite dif- 
ferent regulation patterns can be demonstrated (e.g., Fig. 
13). Such effects raise the possibility thai many gene coregu- 
lation sets may be revealed through the study of a suffi- 
ciently large population of control animals (i.e., without 
any experimental manipulation). This approach, exploiting 
natural biological variation in protein expression instead of 
drug effects, offers an important incentive for the construc- 
tion of a large library of control animal patterns. 



4 Conclusions 

Because of the widespread use of rat liver in both basic bio- 
chemistry and in toxicology, there is a long-term need for a 
comprehensive database of liver proteins. The rat liver mas- 
ter pattern presented here has proven to be an accurate re- 
presentation of this system, having been matched to more 
than 700 gels to date. As the number of proteins identified 
and the number of compounds tested for gene expression 
effects grows, we expect this database to contribute valu- 
able insights into gene regulation. Its practical utility in sev- 
eral areas of mechanistic toxicology is already being de- 
monstrated. 

Received September 11, 1991 
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Figun J. Synthetic representation of the standard rat liver 2-D master pattern, rendered as a greyscalc image using a videoprinter. 
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Figure 2. Schematic representation of the master pattern (the same as Fig. 1). useful as an aid in relai 
quadrants. 
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« Figure 7. (a) Plot of computed isoelectric point versus gel ^position for 
two sets of carbamylated standard proteins (rabbit muscle CPK M and 
human hemoglobin & chain, filled diamonds) and several other proteins 
(shaded squares), (b) The identities of the various proteins represented 
by the squares are indicated by the numbers in corresponding positions 
on (a); these refer to Table 4. 
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r^iz/f P. Montage showing effects in the 
region of MSN :4 13. The montage shows a 
small window into one portion of the 2-D 
pattern, one rowof winoowj foreach expe- 
rimental group, and one panel for each gel 
in the experiment. The left-most pattern 
in each row is a group-specific copy of the 
master pattern followed by the patterns 
for the five individual rats in the group. 
The highlighted protein spots (filled circ- 
les) are spot 413 (on the rignt of each pan- 
el; identified as cytosoiic HMG-CoA syn- 
thase) and two modified forms of it (1250 
and 933). From the top, the rows (experi- 
mental groups) are: high cholesterol, con- 
trols, cholestyramine, lovastatin, and lova- 
statin plus cholestyramine. 
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Figure 10. Bargraph showing the quantita- 
tive cfTecis of various treatments on the 
abundance of MSN:4]3 (cyiosolic HMG- 
Coa synthase) in the gels of Fig. 9 




Figure U. Bargraphs of a series of six core- 
gulated spots including MSN:413. In the 
bargraphs. the abundances of (he appro- 
pnate spol (master spot number shown it 
the top of the panel) in each animal are 
shown. The five five-animal groups are in 
the order (left to right): high cholesterol 
controls, cholestyramine, lovastatin and 
lovastatin plus cholestyramine. Each bar 
wuhin a group represents one experimen- 
ulanimalliver(one2.Dgel).Note the cor- 
related expression of the 6 spots, espe- 
cially in the two far right (most strongly in- 
duced) groups. 
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figure I J. Data on spot MSN:367, presented as in Fig. 11. This protein 
shows unambiguously the anii-synergistic effect oflovasiatin and choles- 
tyramine (fifth group) as compared to lovastatin (fourth group). This res- 
ponse contrasts strongly with the regulation pattern seen in Fig. 11. 



7 Addendum 2: Tables 1-4 

Tabfc 1. Masrertable of protein* Iirihe'rat liver caiabase 81 



MSN 



Y CFKDt SDSMW 



MSN 



Y CFKol SOSMW 



3 
5 
8 

11 
15 
17 
18 
19 
20 
21 
22 
23 
24 
25 



311 
568 
812 
548 
645 
629 
906 
755 
649 
1204 
332 
787 
313 
807 



27 1164 

28 1263 



29 
30 



743 
768 



32 1216 

•33 1145 

34 1037 

35 8C3 



36 
38 
39 
41 
42 
43 



712 
763 
304 
1165 
664 
1318 
44 1924 

46 1203 

47 1391 

48 309 



434 

263 
426 
268 
520 
589 
414 
298 
403 
448 
434 
424 
417 
516 
524 
446 
605 
112 
417 
445 
555 
412 
606 
694 
470 
569 
607 
589 
362 



49 

50 
51 



605 
621 
1113 



52 1620 

53 725 



54 

55 
56 



2001 
722 
€78 



57 1682 

58 1091 

59 1171 

60 1400 

61 1653 

62 1668 

65 735 

66 1263 

67 1252 

68 779 

69 1064 
71 656 



72 
73 



638 
1582 



74 1570 

75 1264 

76 1338 

77 1833 

78 1767 

79 925 



80 
61 



534 
1811 



82 1412 

83 1471 

64 1G62 

65 1596 

66 1817 

67 516 
86 1589 
69 1706 



90 
91 



651 
1415 



62 1 773 
S3 1338 
94 1706 



447 
454 

587 
535 
522 
499 
177 
500 
830 
533 
302 
560 
565 
624 
508 
567 
297 
312 
407 
662 
296 
589 
545 
583 
556 
621 
564 
363 
565 
738 
698 
363 
661 
347 
563 
479 
301 
1371 
698 
719 
329 
710 
545 
446 
696 



*-35.0 
-24.3 
-16.0 
-25.2 
-15.3 
21.6 
-14.0 
-17.5 
-20.9 
-8.7 
«-35.0 
-16.6 
<-35.0 
-16.1 
-9.0 
-6.0 
-17.8 
-17.2 
-8.6 
-9.5 
-11 J 
-14.9 
-18.7 
-17.3 
*-35.0 
-9.2 
-19.6 
-7.3 
-0.1 
-8.7 
-6.3 
<-35.0 
-22.5 
-21.8 
-10.0 
-0.8 
•18.3 
>0.0 
-18.4 
•19.8 
-2.5 
-10.3 
-9.2 
-6.2 
-0.6 
-0.4 
-18.1 
-6.0 
-8.1 
•16.8 
•10.8 
•20.6 
-21.2 
-3.6 
•3.8 
-8.0 
-7.0 
-0.6 
-1.5 
•13.6 
-26.1 
-1.0 
-6.0 
•5.0 
-2.7 
-3.4 
-0.9 
-27.0 
-3.5 
•2.2 
•20.6 
-6.0 
•1.4 
-7.0 
-2.2 



63.800 
102.900 
64,800 
101.000 
55.200 
50.000 
66,300 
90.200 
67,900 
62,100 
63,800 
65.000 
66.000 
55.500 
54,900 
62.400 
49.000 
348.600 
66,000 
62.500 
52.400 
66.600 
46,900 
43.800 
59,800 
51.400 
46,800 
50.000 
74,600 
50.200 
€2.300 
€1.500 
50,100 
53,900 
55,000 
57,000 
170.800 
56,900 
37,300 
54,100 
89,000 
50,600 
50.300 
47,600 
56.200 
51,500 
90.500 
85.900 
67,300 
43.900 
90,600 
50,000 
53.100 
50.400 
52.300 
46.000 
51.600 
74.400 
51.700 
41,600 
43.600 
74.500 

44,500 

77.500 
51.800 
5E.900 
69.100 
17.400 
43.600 
42.500 
61,700 
43.000 
53.200 
62.300 
43.700 



MSN 



95 
96 
97 
98 
99 
100 
101 
102 
103 
104 
105 
106 
107 
106 
109 
110 
111 
113 
114 
115 
116 
117 
118 
120 
121 
122 
123 
124 
125 
126 
127 
128 
129 
130 
131 
132 
133 
134 
135 
136 
137 
138 
139 
140 
141 
142 
143 
144 
145 
146 
147 
146 
146 
150 
151 
152 
153 
154 
155 
156 
157 
156 
156 
160 
161 
162 
164 
166 
167 
168 
169 
170 
171 
172 
173 



1119 
1731 
1033 
1406 
£78 
2004 
1106 
462 
665 
773 
312 
1769 
1565 
1662 
1482 
778 
1728 
1191 
1296 
662 
1146 
1548 
1050 
1530 
636 
1572 
23 
€21 
1298 
672 
1000 
1229 
1422 
1776 
1930 
660 
666 
1271 
1161 
453 
1658 
1504 
1488 
1669 
311 
1366 
1429 
615 
2006 
2006 
1070 
1347 
541 
1645 
1269 
1507 
1722 
932 
1031 
1970 
1256 
1275 
1663 
1034 
1953 
1020 
1566 
1905 
1340 
1506 
1336 
1969 
600 
476 
919 



536 
756 
566 
565 
1149 
538 
€23 
455 
630 
1162 
1117 
509 
720 
807 
593 
516 
700 
680 
185 
907 
610 
640 
577 
628 
423 
712 
1433 
1474 
662 
921 
717 
311 
832 
499 
757 
537 
1019 
862 
1369 
1063 
623 
697 
707 
756 
1417 
915 
346 
1017 
566 
518 
1108 
578 
1461 
760 
236 
911 
448 
503 
294 
664 
163 
417 
820 
527 
771 
1482 
606 
565 
181 
563 
€78 
541 
378 
956 
1314 



-S.fi 
-2.0 
-11.4 
-6.1 
•23.6 
>0.0 
•10.1 
•26.5 
•20.2 
•17.0 
<-35.0 
-1.5 
-3.6 
-2.4 



•16.9 
-2.0 
•6.9 
-7.5 
-19.6 
-9.5 
-4.1 
-11.1 
-4.3 
-15.4 
•3.8 
<-35.0 
-21.9 
•7.5 
-14.7 
-12.0 
-6.4 
-5.6 
•1.4 
-0.1 
•20.4 
-20.2 
-7.9 
-9.3 
-29.7 
-0.6 
-4.6 
-4.8 
-2.4 
<-35.0 
-6.7 
-5.7 
-22.1 
>0.0 
>0.0 
-10.7 
-6.9 
-25.7 
•2.8 
•7.9 
-4.5 
-2.1 
-13.5 
•11.4 
>0.0 
-6.1 
•7.8 
-2.6 
-11.4 
>0.0 
•11.6 
-3.8 
-0.2 
-7.0 
-4.6 
•7.0 
>0.0 
■16.3 
•28.7 
•13.7 



53.800 
40.700 
51,600 
51,700 
25.000 
53.700 
47,900 
€1,300 
37.300 
23,800 
26.100 
56.100 
42.500 
38.300 
49.700 
55.500 
43,500 
44.500 
160.800 
34,100 
46.700 
36.500 
50,800 
37,400 
65,200 
42,900 
15.30C 
13.90C 
36.00C 
33.50C 
42.60C 
66.10C 
37.30C 
57.00C 
4O.70C 
53.60C 
29.70C 
36.00C 
16.80C 
28.10C 
37.70C 
43.70C 
43.20C 
40.70C 
15.80C 
33.80C 
77,900 
29,800 
51,600 
55.300 
26,500 
50,600 
13.700 
40.500 
117.000 
33.900 
62.100 
56.600 
91.400 

44.400 

162.400 
65.900 
37.800 
54.600 
40,000 
13.700 
38.400 
51,700 
164,900 
50.400 
44.700 
53.500 
71. BOO 
32.100 
19.300 
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" Mas J" "We of proteins in the rat liver database, showing spot master number, gt\ position Uand,) 
predicted molecular mass (from the standard curve of Fig. 8). P°«»°n uano». 



174 

175 
177 
178 
179 
180 
181 
182 
184 
185 
186 
187 
188 
191 
192 
193 
194 
195 
196 
197 
198 
199 
200 
201 
202 
203 
204 
205 
206 
207 
208 
210 
211 
213 
214 
215 
216 
217 
218 
219 
220 
221 
223 
225 
226 
227 
228 
229 
230 
232 
234 
235 
236 
237 
238 
239 
240 
241 
242 
243 
244 
245 
246 
247 
248 
249 
250 
251 
252 
253 
254 
255 
256 
257 
256 



1364 
825 
1582 
1321 
1069 
1866 
411 
804 
1860 
1997 
279 
773 
1538 
1560 
1816 
1469 
1380 
784 
1227 
667 
2006 
1711 
872 
292 
736 
786 
1224 
439 
1994 
1895 
240 
1700 
902 
1067 
1340 
1591 
1585 
1159 
931 
713 
1479 
965 
934 
1612 
621 
1566 
1065 
1577 
1456 
1440 
1692 
618 
920 
952 
1611 
1489 
501 
1820 
1357 
711 
1655 
1189 
551 
1348 
460 
1733 
1974 
806 
674 
753 
995 
1690 
994 
506 
1517 



183 
393 

553 
710 
615 
567 
295 
730 
896 
1017 
1113 
296 
807 
674 
687 
555 
266 
632 
1185 
553 
681 
674 
424 
435 
253 
829 
569 
963 
571 
667 
1418 
499 
517 
664 
668 
495 
755 
393 
572 
177 
911 
927 
716 
1045 
411 
1483 
567 
690 
496 
649 
489 
1004 
1138 
1008 
541 
720 
446 
569 
656 
1182 
621 
474 
459 
604 
446 

451 

788 



392 
553 
646 
450 
679 
1006 
464 
820 



-6.7 
-15.7 
-3.6 
•7.2 
-10.4 
-0.5 
-32.1 
-16.2 
-0.6 
>0.0 
<-35.0 
•17.0 
-4.2 
-3.9 
-0.9 
-5.0 
-6.4 
-16.7 
-6.4 
-20.1 
>0.0 
-2.2 
-14.7 
<-35.0 
-18.0 
-16.7 
•6.5 
-30.9 
>0.0 
■0.3 
<-35.0 
-2.3 
-14.1 
-10.4 
•7.0 
-3.5 
-3.6 
-9.3 
-13.5 
-18.7 
-4.9 
-12.8 
-13.5 
•1.0 
-15.8 
-3.6 
-10.8 
-3.7 
•5.2 
•5.5 
■2.4 
•22.0 
-13.7 
-13.1 
•3.2 
-4.8 
•27.7 
-0.9 
-6.8 
•18.7 
-0.6 
4.9 
•25.1 
-6.9 
•29.3 
•1.9 
>0.0 
16.1 
14.6 
17.6 
12.1 
-2.4 
12.1 
27.4 
-4.4 



162.900 
69,300 
52.600 
43.000 
48.300 
51.600 
91.200 
42.000 
34.500 
29.800 
26.300 
90.800 
38,400 
44,900 
44,200 
52.400 
101.600 
47.300 
23.700 
52.600 
44.500 
44,900 
65.000 
63,700 
107.800 
37,400 
50.000 
31,100 
51.300 
44,200 
15.800 
57.000 
55.400 
44,400 
45.200 
57.300 
40.700 
69.300 
51.200 
170.500 
33.900 
33.300 
42.700 
28.800 
66.800 
13.600 
51,600 
34,600 
57,300 
36.500 
57.900 
30.300 
25,400 
30.200 
53.500 
42.500 
62.100 
51,400 
45.800 
23.800 
46,000 
59,300 
61.000 
49,100 
62.100 
61.800 
39.200 
69.500 
52.500 
36,500 
61.900 
44.600 
30.200 
60.400 
37.800 
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259 


1796 


961 


-1.1 


31.900 


260 


661 


1361 


•20.4 


17.700 


261 


1725 


679 


-2.0 


44,600 


262 


496 


1127 


.28.0 


25.800 


263 


1063 


172 


-10.9 


177.400 


265 


1390 


673 


-6.3 


45.000 


266 


510 


437 


-27.3 


63,400 


267 


660 


1038 


-20.4 


29.000 


268 


430 


961 


.31.0 


31.900 


269 


1044 


606 


-11.2 


48,900 


270 


2019 


853 


>0.0 


36.300 


571 


857 


422 


-15.0 


65.200 


272 
* 




968 


-14.2 


31.700 


274 


1292 


712 


-7.6 


42.900 


275 


1350 


590 


-6.9 


49.900 


276 


1670 


1069 


-2.6 


27.100 


277 




538 


-19.4 


53.700 


578 


961 


718 


-13.0 


42.600 


279 


£79 


570 


-14.5 


51.300 


261 


1648 


1064 


-07 


27,300 




1505 


525 


-4.6 


54.800 


283 


1313 


1147 


-7.3 


25.100 


284 


1314 


829 


•7.3 


37.400 


285 


1332 


408 


-7.1 


67.200 


286 


1277 


652 


-7.8 


46.100 


288 


1391 


824 


-6.3 


37.600 


289 


1147 


579 


-9.5 


50.700 


290 


825 


511 


-13.6 


55.900 


291 


787 


1476 


•16.6 


13.900 


PS2 


1462 


818 


-5.1 


37.800 




531 


446 


-26.3 


62.000 




860 


698 


-14.9 


43,600 


295 


1162 


609 


-9.3 


48,700 




218 


814 


<-35.0 


36,000 


2C7 


1377 


979 


-6.5 


31,300 


299 


913 


1523 


-13.9 


12,400 


300 


2012 


667 


>0.0 


45,300 


301 


702 


178 


•19.0 


1 69.200 


302 


494 


1280 


•28.1 


20.400 


303 


403 


1008 


-32.6 


30,100 


304 


1843 


1585 


-0.7 


10.300 


305 


1049 


593 


-11.1 


4S.600 


306 


1606 


989 


-3.3 


30.900 


307 


1219 


916 


-6.5 


33.700 


306 


1627 


755 


-3.0 


40.700 


309 


1524 


892 


-4.4 


34,700 


310 


1769 


1028 


-1.5 


2V400 


311 


1609 


1451 


-3 J 


14,700 


312 


266 


1406 


<-35.0 


16,100 


313 


1902 


1365 


-0.3 


17.600 


314 


1316 


1395 


-7.3 


16.600 


315 


1341 


523 


•7.0 


54.900 


316 


1104 


1053 


-10.1 


26,500 


320 


1480 


1459 


-4.9 


14,400 


321 


850 


603 


-15.1 


45,100 


322 


1454 


1494 


-5.3 


13,300 


323 


670 


626 


-20.0 


47,700 


324 


655 


101 


-20.6 


420.500 


325 


1521 


675 


-4.4 


44,800 


326 


1587 


677 


-3.6 


44,700 


327 


1388 


409 


-6.3 


67.000 


326 


448 


1291 


-30.0 


20.100 


330 


1608 


751 


-3.3 


40,900 


331 


1566 


697 


•3.8 


43.700 


332 


531 


471 


-26.3 


59.600 


333 


784 


1156 


-16.7 


24.700 


334 


1059 


407 


-10 S 


67,300 


335 


1593 


303 


•3.5 


88.500 


336 


1616 


598 


•3.2 


49,400 


338 


1854 


1004 


-0.6 


30.300 


339 


1265 


868 


-8.0 


34.900 


340 


561 


585 


-23.6 


50,300 


341 


1497 


1047 


-4.7 


26.700 


343 


1351 


265 


-6.8 


1C2.2O0 


344 


1613 


549 


-0.9 


52.800 
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345 


1006 


578 


•11.9 


50.800 


346 


1095 


640 


•10.3 


46.800 


347 


625 


728 


•21.7 


42.000 


348 


361 


963 


-35.3 


31,100 


349 


no 


1343 


«-35.0 


16.300 


390 


521 


1130 


-26.7 


25.700 


351 


912 


£19 


-13.9 


48.100 


352 


1574 


530 


-3.7 


54.300 


353 


961 


912 


•12.9 


33.900 


354 


706 


762 


-18.9 


40,400 


355 


1450 


630 


-S3 


37.300 


356 


1374 


1152 


-6.5 


24,900 


357 


474 


9S7 


-2E.7 


30.600 


358 


798 


346 


-16.3 


77.800 


359 


764 


338 


-17.3 


79.400 


360 


1384 


1068 


-6.4 


27.900 


361 


1713 


769 


-2.1 


40,100 


362 


1161 


859 


-9.3 


36.100 


363 


514 


1156 


-13.8 


24.800 


364 


412 


435 


-32.0 


63.700 


365 


741 


486 


-17.9 


58.200 


366 


678 


1503 


-14.6 


13.000 


367 


1560 


935 


-3.9 


33.000 


368 


963 


520 


-12.4 


55.200 


369 


434 


441 


-31.0 


63,000 


370 


639 


610 


-21.2 


48.700 


371 


1587 


860 


•3.6 


36,100 


372 


1875 


762 


-0.5 


40,400 


373 


1351 


1059 


-6.8 


28,300 


374 


1506 


715 


-4.6 


42.700 


375 


1823 


532 


-0.9 


54.200 


376 


254 


417 


<-35.0 


65.900 


377 


1409 


583 


-6.1 


50,400 


378 


621 


494 


-21 .8 


57.500 


379 


1017 


595 


-11.7 


45,600 


381 


953 


598 


-13.1 


49,400 


382 


856 


674 


-15.0 


44,900 


383 


1252 


258 


-8.1 


105.300 


384 


1699 


1518 


•2.3 


12.500 


385 


1042 


493 


-11.2 


57,500 


386 


1490 


563 


-47 


50,400 


367 


1554 


603 


-4.0 


45.100 


388 


1193 


404 


•8.9 


67.700 


389 


1374 


902 


-6.5 


34.300 


390 


1456 


969 


-5.2 


31.700 


391 


718 


690 


-18.5 


44,000 


392 


1799 


732 


•1.1 


41.900 


393 


1482 


758 


-4.8 


40.600 


394 


1227 


1461 


-6.4 


14.400 


395 


1530 


577 


-4.3 


50.800 


396 


1410 


755 


-6.0 


40.800 


397 


912 


256 


-13.9 


106.400 


399 


1465 


1063 


•5.0 


2e.!00 


400 


1473 


450 


-4.9 


61.900 


401 


1029 


1140 


-11.5 


25.300 


403 


1516 


754 


-4.4 


40,800 


404 


1495 


554 


-47 


52.500 


405 


1525 


1092 


-4.3 


27.100 


406 


723 


252 


-18.4 


108,000 


409 


650 


663 


-20.8 


45.500 


410 


1501 


478 


-4.6 


55.000 


411 


S36 


1057 


-13.4 


26.300 


412 


350 


1120 


-35.9 


26.000 


413 


1033 


538 


-11.4 


53700 


415 


737 


425 


-18.0 


64,900 


416 


1578 


606 


•3.7 


48.900 


417 


646 


496 


-21.0 


57.300 


418 


1695 


482 


-2.3 


56,600 


419 


725 


770 


•16.3 


40.000 


420 


1289 


1041 


-7.7 


2e.9O0 


421 


1171 


912 


-9.1 


33.900 


422 


599 


162 


-22.8 


193.700 


423 


929 


856 


•13.6 


36,200 


424 


739 


625 


-17.9 


47,700 


425 


1490 


965 


-47 


31.800 
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426 


1296 


704 


-7.0 


AA W\ 


427 


610 


843 


•16.0 




428 


1565 


303 


-3.9 




429 


1256 


847 


4.0 


Vc am 


430 


1253 


562 


-8.1 




431 


734 


1426 


-16.1 


1* cm 


432 


483 


433 


-28.5 




434 


516 


1041 


-26.9 




435 


1020 


1170 


-11 .6 
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1122 


1 




»4 7,800 


437 


1870 




•v.s 


45,000 




A I* 

•JO 


1 1U2 


-31 .0 


26.700 
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<-35.0 


36.600 


Air\ 


1 / *HJ 


r a a 
5*4 


-1 .8 


53.200 


441 




1571 


•22.8 


10.800 


A At 
m *J 


/4J 


335 


-17.8 


80.100 


lit 

*4o 


801 


668 


-16.2 


45.200 


447 


1050 


926 


-11.1 


33.300 


448 


1245 


1296 


-6.2 


19.800 


A A ft 
449 


1576 


1516 


-3.7 


12.600 


4&Q 


1818 


1021 


-0.9 


2S.600 


451 


1094 


440 


-10.3 


63,100 


Att 

452 


1*45 


802 


>0.0 


38,600 


453 


1652 


894 


-2.8 


34.600 


A\ *A 
454 


1403 


500 


-6.1 


56.900 




1394 


718 


-6.3 


42,600 


AKJ 


one 
905 


436 


-14.0 


63.500 


459 


1038 


581 


-11.3 


50.500 


460 


1596 


294 


-3.4 


91,400 


461 


1526 


863 


-4.3 


35.900 


462 


1098 


1137 


-10.2 


25.400 


463 


849 


1125 


-15.2 


25.800 


464 


1814 


1072 


-0.9 


27.800 


465 


1388 


481 


-6.3 


58.700 


466 


1194 


1084 


-8.9 


27.300 


468 


577 


467 


•23.9 


60.100 


489 


1140 


888 


-9.6 


34.900 


470 


1797 


524 


-1.1 


54.800 


471 


1293 


1133 


-7.6 


25,500 


472 


616 


655 


-21 .9 


46.000 


473 


2009 


299 


>0.0 


89.900 


474 


1205 


215 


-87 


131.300 


475 


1035 


788 


-11.4 


39.200 


476 


160 


155 


<-35.0 


207.600 


A^J 

477 


469 


1370 


•28.9 


17,400 


478 


599 


662 


-22.8 


45.600 


47S 


1009 


540 


-11.8 


53.500 


480 


1216 


235 


-8.6 


117.400 


4o2 


616 


346 


-15.9 


77.800 


AE11 


09J 


673 


-19.3 


44.900 


JfiC 
•Os 


• DUO 


1013 


-3.3 


30,000 


486 


AfU. 




-28.6 


49,300 


487 


> 




•11 .5 


48,800 
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1 fwc. 


1 lOD 


-1 1 .2 


23,700 


AAQ 

•w 


1 ouy 




-3.3 


69.200 


AQft 


/ /a 


1 289 


-17.0 


20.100 


«yi 


692 


178 


-19.3 


169.300 




1 100 


964 


•10.2 


31.800 


4S3 


1 760 


776 


-1.6 


39.700 


494 


682 


247 


-14.5 


110.700 


495 


470 


1258 


-28.9 


21,200 


496 


494 


1436 


-28.1 


15.200 


aC7 


SOU 


652 


•12.5 


36.400 
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1414 


54c 


-6.0 


53,100 


w\ 
>x> 




1072 


-8.3 


27,800 


501 


1 246 


659 


-c.* 


a r 7m 
45.700 


502 


824 


792 


•15.7 


39.000 


503 


1246 


1134 


-8.2 


25.500 


504 


1115 


1407 


-99 


16.200 


505 


1189 


391 


-8.9 


69.700 


506 


1578 


402 


•37 


68.000 


507 


787 


250 


-16.6 


109.000 


508 


579 


552 


•12.5 


52.600 


509 


1153 


619 


-9.4 


48.100 


510 


1730 


1006 


-2.0 


30.200 
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484 


£12 


1099 


533 


513 


1696 


1034 


514 


946 


636 


515 


461 


543 


516 


1334 


1044 


517 


666 


1021 


510 


796 


779 


519 


622 


670 


520 


632 


165 


521 


1332 


830 


522 


603 


1104 


523 


1190 


309 


524 


479 


1226 


525 


766 


1066 


526 


747 


1016 


527 


1170 


231 


520 


1502 


542 


530 


1728 


620 


S32 


507 


1011 


533 


870 


489 . 


534 


1347 


1065 


535 


1513 


346 


536 


306 


654 


538 


iesi 


669 


530 


1463 


962 


540 


909 


561 


541 


€25 


289 



542 1164 

543 803 



198 
655 

544 1259 1143 
£56 1526 
803 



545 
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547 1162 



1071 
274 



548 128 1321 < 

549 1355 1122 

550 595 
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494 
405 
410 

975 

557 1477 1030 
960 
700 



552 1369 

£53 992 

555 1125 

£56 705 



556 
559 



560 1028 
562 698 



564 

565 
566 



789 
777 
960 



567 1519 



569 
570 
571 



1212 
760 
618 



573 1142 

574 532 



575 
£76 
577 
578 



771 
1066 
622 
914 



579 1064 
560 1524 



1392 
962 

1467 
758 



581 
582 
564 
585 
566 
587 

568 1886 

589 642 

590 1317 

591 65 

592 1014 



563 
1109 
621 
794 
1446 
766 
328 
611 
661 
594 
656 
771 
787 
250 
534 
734 
754 
794 
714 
783 
666 
672 
731 



•16.0 
•10.2 
-&3 
-13.2 
•26.5 
-7.1 
-14.8 
-16.3 
-15.7 
-21.5 
-7.1 
-22.6 
*6.9 
-26.6 
-17.2 
-17.7 
-9.2 
-4.6 
-2.0 
-27.4 
-14.7 
-6.6 
-4.5 
<-35.0 
-0.7 
-5.1 
-13.9 
-21.7 
-9.2 
-16.2 
-8.0 
-15.0 
-16.2 
-6.3 
-35.0 
-6.8 
-23.0 
-6.6 
-12.2 
-6.8 
-18.9 
-4.9 
-12.5 
-19.1 
-11.5 
-14.1 
-16.6 
-16.9 
-12.5 
-4.4 
-6.6 
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-12.8 
•20.0 
•8.7 
-13.9 
-22.3 
-7.7 
-15.8 
-12.6 
-32.6 
<-35.0 
-15.3 
-9.8 
-12.1 
-3.2 
-17.7 
•10.8 
4.8 
-1.6 
-6.9 
-11.5 
•17.9 
-15.9 
-16.7 
-9.3 
-10.4 
•11.5 
•15.2 
-14.1 
-14.4 
-0.9 
4.7 
-22.0 
-12.8 
-12.7 
-1.9 
■21.1 
-15.8 
-14.6 
<-35.0 
4.4 
294 
•19.7 
-0.9 
-11.4 
-3.0 
-7.4 
-2.0 
-11.7 
-3.7 
-16.8 
-9.7 
•15.9 
•16.7 
-7.7 



37.500 
36.000 
5G.6O0 
57.100 
57.700 
100.300 
65.100 
41.600 
76.200 
45.400 
151.000 
213.000 
43.400 
53.000 
42.900 
37.900 
174.900 
65.700 
67.100 
83.900 
80.500 
24.800 
106.600 
36,700 
210.300 
28.700 
136.900 
115.300 
63.400 
51,600 
57.400 
31.200 
91.100 
45,400 
46,700 
25.300 
46.700 
33.900 
12.800 
84,700 
26.600 
24,600 
52,400 
74,900 
84,500 
33.300 
43,400 
38.200 
60.700 
36.600 
50,700 
56.500 
93.100 
92.700 
40.000 
56,900 
23.700 
58.100 
96,400 
46.600 
41.200 
53.500 
45.600 
25.800 
47.200 
30.700 
25,500 
65.000 
41.300 
22.500 
58.400 
591.300 
84.600 
62.400 
41.500 
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MSN 



Y CPKd SDSMW 



MSN 



Y CFKdI SDSMW 



MSN 



V CPKtf SOSMW 



1026 
1C27 
1C28 
1030 
1C31 
1G32 
1C33 
1034 
1025 
1G36 
1035 
1040 
1041 
1044 
1045 
1047 
104€ 
1040 
1050 
1061 
1052 
1053 
1054 
1055 
1056 
1056 
1060 

ioei 
ioe2 

1064 
1065 
1066 
1067 
1066 
1069 

1C71 

1073 

1075 

1C76 

1076 

1061 

1063 

1065 

1090 

1092 

1093 

1094 

1095 

1096 

1099 

1101 

1102 

1103 

1105 

1106 

1107 

1106 

1111 

1112 
1115 
1116 
1117 
1116 
1119 
1120 
1121 
1122 
1123 
1125 
1126 
1126 
1133 
1139 
1147 
1146 



405 
1296 
656 
1284 
966 
1547 
1361 
1525 
1126 
1226 
1761 
541 
616 
1036 
1439 
1540 
1576 
1069 
949 
426 
1563 
779 
1C13 
1360 
264 
1261 
393 
1617 
1245 
1256 
705 
1181 
529 
506 
1696 
673 
1768 
636 
1663 
£26 
971 
1697 
1157 
620 
1867 
2019 
1546 
1545 
61 
1954 
588 
1050 
457 
1884 
1714 
1717 
1976 
547 
1346 
1385 
1076 
975 
1202 
1022 
1905 
1512 
1114 
1464 
1046 
1122 
1722 
1096 
1830 
764 
1966 



552 



547 
226 
622 
403 
551 
496 
645 
274 
262 
639 
910 
465 
407 
250 
635 
411 
1040 
618 
1365 
1092 
620 
377 
663 
746 
605 
645 
746 
792 
934 
734 
656 
696 
604 
609 
1128 
773 
861 
566 
483 
202 
794 
910 
597 
894 
536 
477 
935 
237 
1046 
667 
797 
532 
649 
546 
722 
1066 
621 
762 
816 
787 
933 
1076 
616 
1301 
677 
452 
657 
602 
892 
825 
569 
1162 
724 



-3215 
.7.5 

-15.0 
-7.7 

-12.3 
-4.1 

-6.4 
-4.3 

-9.7 
4.5 
-1.6 
-25.7 
-15.8 
-11.3 
-5.5 
-4.2 
-3.7 
-10.4 
-13.2 
-31.1 
-3.6 
-16.6 
^3.2 
-6.5 
<-35.0 
4.0 
-33.3 
-0.9 
-6.2 
4.1 
•16.9 
-9.0 
-26.3 
-27.4 
-0.3 
-14.7 
-1.5 
-15.4 
-0.6 
-15.7 
-12.7 
-2.3 
-9.4 
-21.9 
-0.5 
>0.0 
-4.1 
-4.1 
<-35.0 
>0.0 
-23.3 
-11.1 
•29.5 
-0.4 
-2.1 
-2.1 
>0.0 
-25.3 
-6.9 
4.4 
-10.6 
-12.6 
4.7 
-11.6 
-0.3 
-4.5 
-9.9 
-5.1 
•11.1 
-9.6 
-2.1 
-10.2 
-0.6 
•17.3 
>0.0 



£2.600 
3€ f 500 
53,000 
123.200 
37,700 
67,900 
£2.700 
£7,200 
46.500 
96,300 
1C3.6O0 
36.900 
34,000 
£6.300 
£7,300 
109.200 
47,100 
66.700 
2E.900 
37,800 
16.900 
17,000 
4* .000 
72,000 
45,500 
41,200 
49.000 
46.600 
41.200 
39,000 
33.000 
'1.600 
45.800 
43,700 
49,100 
46,700 
25.800 
39.900 
36.000 
£1.600 
56.500 
142,300 
36.900 
34.000 
49,500 
34,600 
53.700 
59,100 
33.000 
116.000 
28.600 
45.200 
36.800 
54.200 
46.300 
53.100 
42.400 
28,000 
48.000 
40,400 
36.000 
39.300 
33.100 
27.600 
46,300 
19.700 
44.700 
61.700 
36.200 
38.600 
34,700 
37.500 
51.400 
23.800 
42.300 



1153 
1154 
1161 
1162 
1163 
1168 
1170 
1171 
1172 
1174 
1176 
1177 
1178 
1179 
1180 
1181 
1162 
1163 
1164 
1165 
1166 
1189 
1190 
1191 
11S2 
1193 
1194 
1195 
1196 
1197 
1196 
1199 
1200 
1201 
1202 
1203 
1204 
1205 
1206 
1209 
1210 
1211 
1212 
1214 
1215 
1216 
1217 
1218 
1219 
1220 
1221 
1222 
1223 
1224 
1225 
1226 
1227 
1228 
1229 
1230 
1231 
1232 
1233 
1234 
1235 
1236 
1237 
1238 
1239 
1240 
1241 
1242 
1243 
1244 
1245 



921 
1594 
637 
623 
665 
564 
552 
536 
545 
109S 
1304 
1566 
1606 
1465 
1459 
1431 
1407 
1353 

1454 

1422 
1394 
1171 
1467 
686 
265 
403 
344 
505 
£72 
639 
637 
614 
637 
1095 
1719 
791 
964 
313 
306 
320 
326 
394 
402 
386 
641 
660 
914 
873 
970 
1021 
1392 
1354 
1362 
673 
614 
603 
696 
707 
475 
466 
759 
1324 
1583 
1865 
1812 
1411 
1392 
794 
769 
740 
743 
713 
682 
663 
565 



1158 
664 
400 

397 
397 
£26 
£29 
£24 
£14 
£22 
566 
539 
7C2 
224 
224 
223 
223 
224 
162 
163 
162 
214 
286 
1114 
693 
1292 
1275 
1311 
1293 
1502 
1402 
1407 

1431 

1394 
1545 
666 
1021 
195 
194 
197 
197 
294 
294 
294 
329 
329 
266 
245 
372 
296 
205 
203 
205 
540 
542 
539 
623 
628 
447 
1262 
1461 
1170 
1005 
809 
817 
703 
682 
410 
407 
406 
511 
510 
509 
504 
582 



-13.7 
-3.5 
-21.3 
-21.6 
•20.2 
•24.4 

•25.0 
•25.9 
•2£.5 
-10.2 
-7.5 
-6.6 
-3.3 

-<.e 

5.2 
-5.7 
4.1 
4.4 
-5.3 
-5.8 
4.3 
-9.2 
-5.2 
•19.5 
<-35.0 
-32.6 
<-35.0 
•27.6 
-24.1 
-21.2 
-21.3 
•22.1 
-21.3 
-10.3 
-2.1 
-16.5 
-12.9 
<-35.0 
<-35.0 
<-35.0 
<-3£.0 
-33.2 
-32.7 
-33.7 
-21.2 
-20.4 
-13.6 
-14.7 
-12.7 
•11.6 
4.3 
4.6 
4.7 
-19.9 
-22.1 
-22.6 
•19.2 
-18.9 
-26.7 
-29.0 
-17.4 
-7.2 
•3.6 
-0.6 
-1.0 
4.0 
4.3 
-16.4 
•17.1 
-17.9 
-17.6 
-18.7 
-19.6 
-20.3 
-24 4 



24,700 
35,900 
66.400 
66,800 
66,700 
54.500 
54.500 
54.800 
55.700 
55.000 
5C.200 
53,700 
43,400 
124.900 
124.900 
125.100 
125.200 
124,700 

164.400 

162.600 
164,300 
131.800 
94.200 
26.200 
34,700 
20,000 
20.600 
19,400 
20.000 
13.000 
16,300 
1E.200 
15.400 
16,600 
11.600 
45.200 
29,700 
146,700 
149,800 
147,400 
146.600 
91.400 
91.200 
91,400 
61,600 
61,600 
101,800 
112,000 
72.900 
90,100 
139.500 
141,800 
139.500 
53.600 
53.400 
53.600 
47.800 
47,500 
62.300 
20.400 

14.400 

24.200 
3C.300 
36.200 
37.900 
43.400 
44,500 
66,900 
67.300 
67,500 
55,900 
56.000 
56,100 
56.500 
50.500 



1246 


c^7 

9m / 


577 


-25 J 


1247 


MO 


575 


-26 J 


1240 


516 


572 


-27.0 


15^0 


979 


536 


-12.7 


1 9Ci 
i £91 


607 


532 


-22.4 


1Z5Z 


665 


529 


-20.2 


1253 


B99 


766 


-14.1 


1254 


1311 


746 


-7.4 


1255 


1300 


761 


-7.5 


1257 


1936 


712 


0.0 


1258 


1806 


718 


-1.0 


1259 


1727 


715 


•2.0 


1260 


1629 


713 


-3.0 


1261 


1555 


717 


-4.0 


1262 


1466 


717 


-5.0 


1263 


1413 


722 


4.0 


1264 


1340 


717 


-7.0 


1265 


1263 


717 


4.0 


1266 


1162 


720 


-9.0 


1267 


1110 


717 


•10.0 


1266 


1055 


717 


•11.0 


1269 


999 
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•12.0 


1270 


959 


715 


-13.0 


1271 


905 


712 


•14.0 


1272 


es7 


714 


-15.0 


1273 


610 


705 


-16.0 


1274 


774 


711 
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737 


706 


-18.0 


1278 


702 


711 


-19.0 


1279 


671 


710 


.50 0 


1280 


645 


710 




1261 


617 


707 


-55 O 


1262 


595 


704 


-23.0 


1263 


573 


700 


-24.0 


1284 


552 


695 


-25.0 


1285 


536 


694 


•26.0 


1266 


515 


687 


-27.0 


1287 


496 


6S3 


•28.0 


1286 


467 


669 


-29.0 


1269 


447 


667 


-30,9 


1290 


427 


655 


-31.0 


1291 


412 


655 


-32.0 


1292 


397 


652 


-33.0 


1293 


381 


6S4 


-34.0 


1294 


365 


653 


•35.0 


1295 


346 


653 


<-35.0 



50.800 
50.900 
51.200 
53.900 
54.200 
54.400 

40.200 

41.200 

40,400 

42.900 

42.600 

42.700 

42.800 

42.600 

42.600 

42.400 

42.600 

42.600 

42.500 

42.600 

42.600 

42.600 

42.700 

42.900 

42.800 

43.300 

42,900 

43,100 

42.900 

43,000 

43,000 

43.100 

43.300 

43.500 

43.700 

43.800 

44.200 

44,400 

45.200 
45.300 
45,900 
45.900 
46.100 
46.000 
46,100 
46.100 
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6.03 
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Tiblc 4. Computed pfs of some known proteins related tc measured CPK pfs 



Prctein Name 



PIR *ASP #GLU #HIS #LYS #ARG Calc 
N£me 3.9 4.1 6.0 10.8- 12.5 Dl 



Real 
CPK 



0 Creatine phospho kinase (CPK), rabbit muscle 

1 Fatty acid-binding protein, rat hepatic 

2 b2-microglcbulin, human 

3 Cartamoyl-phcsphate synthase, rat 

4 Froalbumin ( serum albumin precursor), rat 

5 Serum albumin, rat 

6 Supercxid dismuiase (Cu-Zn, SOD), rat 

7 Phcspholipsse C, phophoincsitide-specific (?), rat 

8 Albumin, human 

9 * Apo A-l lipoprotein, rat 

10 proApo A-l lipoprotein, human 

1 1 NADPH cytochrome P-450 reductase, rat 

12 Retinol binding protein, human 

13 Actin beta, rat 

14 Actin gamma, rat 

15 Apo A-l lipoprotein, human 

1 6 Apo A-IV lipoprotein, human 

17 Tubulin alpha, rat 

16 F 1 ATPase beta, bovine 

19 Tubulin beta, pig 

20 Protein disulphide isomerase (PDI). rat hepatic 

21 Cytochrome b5, rat 

22 Apo C-ll lipoprotein, human 

Amino acid pi assumed in captation* 
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LFHUA1 
RDRT04 
VAHU 
ATRTC 
ATRTC 
LPHUA1 
LPHUA4 
UBRTA 
FWBOB 
UEPGB 
ISRTSS 
CERTS 
LPHUC2 
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The sensitivities of immunoassays relying on conventional radioisotopic labels li * 

Zt^oZZlV ,A, r d immun 7 oradl0metriC a " a V < ,RMA » Permit the measurement of 
enalyte concentrates above ca 10 7 molecules/ml. This limitation primarily derives in the 
case of competitive or 'limited reagent' assays, from the 'manipulation errors a is,no in t£ 
system combined with the physicochemica. characteristics of the particu. a ° ant body used 
however. in the case of 'non-competitive' systems, the specific activity of the labe. may p7a a 
more important constraining role, h is theoretically demonstrable that the development of 
assay techmques y.e.d.ng detection limits significantly lower than 10> molecules/ml depends 

(1) the adoption of 'non-competitive' assays designs- 

III I- e ^ Se f. f '. 8belS ° f hi9her specific activlt Y than radioisotopes; 

invoked em d5SCrlminati0n be,ween th « P' od "«s of the immunological reactions 

Chemiluminescent and fluorescent substances are capable of yielding higher specific activities 
han commonly used radioisotopes when used as direct reagent labels in^his content aZ do* 
thus prov.de a bas.s for the development of 'ultrasensitive', non-competitive. immunoa«av 
methodology Enzymes catalysing chemiluminescent reactions or yie.dingTuorescenJ 
react.on products can likewise be used as labels yielding high effective specific ac vh es an^ 
hence enhanced assay sensitivities. wivnies ana 

A particular advantage of fluorescent labels (albeit one not necessarily confined to them) lies 
m the poss.b.hty they offer of revealing immunological reactions localized in microTpots' 
drstnbuted on an .nert so«,d support. This opens the way to the deve.opment of an entire.v new 
generate of ambient analyte' microspot immunoassays permitting the simu taneou^ 
measurement of tens or even hundreds of different analytes in the same small samj e Z 
(for example) taser scanning techniques. Early experience suggests that microspot assays witS 
sensrtmt.es surpass.ng that of isotopically based methodologies can readily be developed 

Sswpy UIUaSensitive imr ""noassay; fluorescent microspot immunoassay; confocal 
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— INTRODUCTION " ""' 

Icbels have played a ma or role in medicine °nrf 
«her biologically related fields S« 

inoustnes, etc.) during the past two decade. 
Their importance has derived from th* f ■ ■ 
tion both of the 'stmctural specS-' rL P ° Ua " 
ing antibody-antigen reacSfw tht^f 
b.hty of isotopically-labeUed reagents the an^ 
permitting observation of the binding reaction, 

methods with unique specific^SS 
characteristics, and accounts for their uhim J 0 
use throughout modern mcdidnc .nd goC 
However, in the past few years in.^Jf, P' 
increasingly focuse'd on so-cS .J ^ S 
non-rad,o,sotopic, immunoassay methods ^h' 
techniques are based on essential v ;h , 
analytical principles but differ ? ] Z l 
to label the particular inm^^^'f 
or analyte) whose distribu.ioTbetwein h ^ 
and free moieties (following ? he bask an ^ .""^ 
reaction) constitutes the lay ^oZ^l 
reasons for this interest may be ptouZm ' 7* 
four headings: grouped under 
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fundamental reason for their reniar.™- . 
paradoxical,,, trom the ^^clcn,^ 
develop rmcroanalyrial lechnicues Xh a « 

3984; Ekins, ]985) Nevenn!? ' 3kUbu " 
underlvin. concepts are s,m ? r ! ' f me ° f ,he 
stood and" merit bnW H fre£ * uem, y misunder- 
context bnCf d,SCUSS,on in present 



The concept of sensitivity 



(2) TJe ^.^T.&tiE^ 0 ^ 

W s^ cm d ^ lop,,,en, of ■™»>-™>y« assay 

(4), and this presentation will centre nril 

the concepts which underlie out IT y °" 

development strategy in these ™ a ™™™™*y 

.^ATTAINMENT OF 'ULTRA-HIGH' 
IMMUNOASSAY SENSITIVITY 

"Though, as indicated above the ..„ tV • 
radioiso.opically based faSnSLSTX* 
has consumed one of the princioal fo.mH., 
of .hei, widespread use w^'Ty™" 



X^.."SSdto.T confusion has b "" 

with ihe <lo™. "nf ,1,. j e 1 uatln g assay sens t vity 
and Person* , 970 ^ b BeT"" 0 " 5 ; 5"™ (Yal °" 
« also Ekins!;?,:, WtoSU"?,™"?-. 
widely agreed ,h a , ,he „ 0 ? 0 "' "j" 0 "' 
dose^resDon^e cui-vp ;™~r a stee per 

erroneou^. The 'ZIS^ S* £,5^ * 
revealed by ihe fact ihl ,hl • f ,! clearl V 
of Ihe responses vielded it ,? ms l"«"^ 
dependenfon Jl^i&r^j^ * 
chosen to reDresent va naDle which is 

nWEkins^KV^lX^ (S£e Fig - 
has long been recognized haMhe c reaS ° nS < " 
an assay can onlv bi s«kfa«nr i Sens,llv,t y' of 
its lower limit 0 f J teaS uKf Sen, J d ^ 
concept is now embod ed in a 8 || me 'n* 
agreed definitions of the term m ' erna,,on a»y 

precision profile' (Fi £ 2fatt whl^.u ■ e 
estimate (Fig. 2(b)). 0 dose 
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B/F plot 




Any plot 



r 3 ; 




a. 



b. 



Figure 1. (a) Diagrammatic representation of conventional RIA dose-response curves for svstem* „*inn hinh ih;i . ^ i <. i 
antibody concentrations ploned in terms of free-bound (F/E) and bound/free <£?£^ £™ 
amount of anubody Y-elds a dose-response curve of greater slope in the F/B plot, but of lower slope in the B/F DlotT? S 
impossible to decide on the basis of the data shown in this figure, which concentration of antibody y e&s he assay svstem of 
h.gher sensitivity, (b The sens.t.v,ty of an assay is essentially represented by the minimum 6exZS!^^^^Tth 
dose measurement (SD toosel ) at zero dose. This is given by the SD of the response (SD ) 
curve slope at zero dose (i.e. ((SD ) x dD/d*) 0 ). Th.s quantny is unaffec^ 

S *£tl S e SK^' : h 15 COmm ° n 10 mU,tip,Y {SD — ,c bY an 3rb * rar V ^ctor to inS^^S^TjS 
cttcchmg to the m.n.mum detectable dose estimate, though, since no agreement exists regarding the value of this flSttS 
unnecessary step merely adds to confusion when the relat.ve sensitivt.es of two assay procedu re fare compa ed , 



'Competitive' and 'non-competitive' ('limited 
reagent' and 'excess reagent') assays 

A second important misconception in this area is 
the notion that immunoassays relying on the use 
of labelled antibodies (e.g. immunoradiometric 
assays, IRMA) are ipso facto more sensitive than 



those which rely on the use of labelled 'analyte' 
(e.g. radioimmunoassays, RIA); furthermore the 
grounds originally advanced for the claimed 
superiority of labelled antibody methods (Miles 
and Hales, 1968) were partially based on false 
concepts of sensitivity, and thus failed to identify 
the true reasons why certain assay designs are 



aD 



■ working range • 



Dose 0 

(a) 




100 




(b) 



100 



Figure 2. (a) The preosion profile' of an assay portrays the error in the dose measurement as a function of dose The error 
may be represented, mter aha. by the absolute error (6D; e.g. SD of D) or the relative error (AD/0 e g CV of 0) (AdL the 
error .n the measurement of zero dose, represents the sensitivity of the assay. The working range may be defined item 
of dose values wrthjn wh.ch AD/O .s less than an "acceptable" value set by the inv stigator. (b) The more seS o f the 

^JWrST*" ^ 3t 8 '° Wer V3lUe - H ° WeVer " aSS3V " iS m ° re precise at higher va,ues o'Vose. and hSH 
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potentially capable of yielding far higher sensitiv- 
ity than others. This issue likewise merits 
clarification. 

• m The pureJy pragmatic sub-classification of 
immunoassays into labelled antibody and labelled 
analyte methods diverts attention from a more 
fundamental divide in immunoassay methodolo- 
gy, which relates to the optimal concentration of 
antibody required in an assay system to maximize 
its sensitivity. In certain assay designs (which may 
be termed limited reagent' or 'competitive') the 
opumal concentration tends to zero; conversely in 
others (which may be termed 'excess reagent' or 
non-competitive') the concentration tends to 
infinity. It should be particularly emphasized that 
the optimal antibody concentration is essentially 
governed not only by the physicochemical charac- 
teristics of the antibody-anaJyte binding reaction 
but also by the errors incurred in measurement of 
the assay response. Were an assay system to be 

™, H h err0r " free ,' ™ antibod >' concentration 
would be optimal, and the distinction between 
competitive and non-competitive methodologies 
would thus not arise. ^ugies 

Though it is inappropriate in this presentation 
to discuss in detail the statistical and pi s co 
chemical theory underlying this fundamental 

far it ^ S 3; J l CkS ° n €t "'' ,983 >' the '"son 
Z Perh T be , m ° re understood if 

the bas.c principles of .mmunoassay are portrayed 
m a somewhat different way from" that in which 
they are usually presented. All immunoassays 
essentially depend upon measurement of the 
fractional occupancy' by analyte of antibody 
binding sites following reaction of analyte wkh 
antibody (see Fig. 3(a)). Those techniques whiS 
implicitly rely on measurement of residual 
unoccupted, binding sites optimally necessitate 
the use of concentrations of antibody tend ng to 
zero and may be termed -competitive', convefsc" 
ly those , n which occupied sites are direc fy 
measured necessitate use of high antibody con- 

(Fig. 3(b)). This emphasizes that the differences 
in assay design characterizing so-called competl 
live and non-competitive methods are essentially 
unrelated to which component (if any) of 'he 
reaction system is labelled. Indeed immunoassay 
in which no label of any kind is involved ca on 
identical grounds be subdivided into those of 
limited reagent (or 'competitive') and 'excess 
reagent' (or *non-competititve') design. Thus the 
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distinction between these two forms of im 
rnunoassay simply reflects difference™ t?e C 

nd [£Z7LT! ody occupancy is deie S 

? 11 ,S ? enera, 'y undesirable— for 

reasons of accuracy-to measure a small auanihl 
by estimating the difference between w X 

~men?: f n " ^"o-y^lieT"^ 

Measurement et occupied site* 



< 
< 
< 



Measure 
occupies sues 



*y 

t 

| Separate 

y 
y 



Two-site iabeuec amibooy assay tin** , . , 

Y ««y Smgte-$ lle bpeitea antipooy assay 

NON-COMPETITIVE IMMUNOASSAY" 

At> *" ,or "lanmai sensitivity 



Measurement ot unoccupied sites 



(♦ 

<♦ 

<o 
<o 
ko- 



-<♦ 

«)-• 
k<h- 



♦>-• 

...♦>•• t s 

"^j j s *P*rate 

\* 1 Measure 

^ J unoccupied sites 

Smgie stte taPelieo anutxxsy assay 



COMPETITIVE IMMUNOASSAY" 



^* Labelled antigen 

<J-* Labelled ani.-.oiotyp.c antibody 



0 tor ma uma i sensitivity 

y Labelled antibody 
^ Analyte 



antibody binding- s , Ie occupant i s m ' '* ^ h ° W 
antibody methods arl nn J measured. Labelled 

the (labelled 6 . a °n d ,fbod v a^TasZTtur S ' teS 0f 

(below nght) when uwcuM ^,!^* c ™t»™* 
antigen (below left) or ttL^Ll s are .^^ed. Labelled 
methods (below centre rlt Sf anXHti > 0X W antibody 
unoccupied by ana^e ant a?e JEST"?" 1 0f si,es 
•competitive' design. therefore mvariably. of 
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'Competitive* immunoassay % Non-competitive* immunoassay 




abel 4* tRMA 




Figure 4. Curves showing the theoretically predicted re!ci,onsh ip between antibody affinity and the sensitivities achievable 
using competitive and noncompetitive' assay strategies. The 'potential* sensitivity curves assume the use of infinite specific 
acuvrty labels; the sensitivities achievable using '--l-iabelled antigen or antibody are also shown. Shaded areas indicate the 
mr^™im%^ U t e hI° err ° r£ in me££ " remenl °\ th€ ,£ bel. Curves relating to 'competitive' assays assume a 1% error in 
™u ™ i£2 ^ P ™ V fv be 5M£in9 fr ° m ' ex P erime ^ al * errors (i.e. errors other than those inherent in label 
measurement per se). Non-competit.ve curves assume 'non-specific b.nd.ng' of labelled antibody of 0.01 % and 1 % (lower and 
upper curves) respect.vely. Arrows md.cate sens.tivit.es cia.med for typ.caT noncompetitive immunoassay me toioKLI 



Conversely, when occupied sites are measured 
directly, this particular constraint does not arise; 
indeed, considerable advantage often derives 
from using relatively large amounts of antibody in 
the system. 



Sensitivity of 'competitive' and 
'non-competitive' immunoassays 

Competitive and non-competitive immunoassays 
differ significantly in many of their performance 
characteristics in consequence of the differences 
in optimal antibody concentration on which they 
rely. Most particularly they differ in their 
potential sensitivities. Figure 4. portrays the 
sensitivities predicted theoretically as a function 
of antibody binding affinity, making realistic 
assumptions regarding the experimental errors 
incurred in reagent manipulation, 'non-specific' 
binding of labelled antibody, etc., and assuming 
the use of optima] reagent concentrations (Ekins, 
1985). Amongst other concepts illustrated in the 
figure is the much greater assay sensitivity 
potentially attainable (using an antibody of given 
affinity) by adoption of a non-competitive 
approach. In short, whereas the maximal sensitiv- 



ity realistically achievable using a competitive 
design is in the order of 10 7 molecules/ml (using 
antibody of the highest affinity found in practice), 
a non-competitive method is capable of yielding 
sensitivities some orders of magnitude greater 
than this. However, Fig. 4 also demonstrates that, 
assuming the use of high affinity antibodies (i.e. 
M0 n -I0 12 l/M), maximal sensitivities yielded by 
isotopically based techniques (whether relying on 
labelled antibody (IRMA) or labelled analyte 
(R1A), or whether of competitive or non- 
competitive design) are closelv comparable, i.e. 
of the order of 10 7 -10 8 molecules/ml. 

This limitation is a manifestation of the fact 
that, in the case of the non-competitive methods, 
an important constraint on assay sensitivity is 
(under certain circumstances) the Specific activ- 
ity' of the label used. On the other hand, 
limitation of assay sensitivity due to the low 
specific activity of radioisotopic labels does not 
often arise, in practice, in the case of competitive 
assays, whose sensitivity is generally restricted by 
other factors (Ekins, 1985). The fundamental 
significance of this conclusion is that, only by the 
use of labels possessing specific activities higher 
than those of the commonly used radioisotopes in 
assays of non-competitive design, can current 
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_.l -J-J5^JimlttJ)e.b»Khed:-Cbn«erselv use of 
a higher specific activity label in ?1 , f 

red ,n reagent manipulation of the rn^n , /" 
generally encountered in practice) m<=gn,lude 

High specific activity non-isotopic labels 

lulled compoW jTthTpfe se °„ omex 0 ,'?' Z 
the term is widened 10 signify •SaSk'. ■ 
pa unit time per unit we,|ht oftS ' -, '"'f 
TTius it can be used to indicate the ^« 0 ?1Z 

concept denvrfrcrn^e'TarS S."* 
measurement error' (i e error i„ .1 8nal 
«« of the arnwCoSr"- 
limning assay sensitivity, and may-^""^" 
sensitmty-constraining "factors „h i " 

become dominant. Furthermore »'),,„ t d ~ 
•he sensitivities of immunoTSy s ™ nd,n « 
their present limits, the number? of ^ - °, nd 
involved are low and • molecul « 

counting i„di° du't'te ^ts'"^" 
^WS^" *«■ ™"ass^e' h a e 
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_ -a. 

the specific activity oF™t\£l\ t % tethat '*°UBh 
not, in practice, significant v n * 9U ? B does 
competitive assays (see Fia ^ «k1 """^ of 
activity of *H may severe! rll 1'!°*" s P eci «c 
of competitive assays (1 „ , the ser «'tivrty 
which rely on tSe 2 iV ""T* 1 horm °nes) 
tope " Se of th,s Particular radioiso- 

Specific Activities 



Enrvmes: 



Chemiluminescent 
labels 

Fluorescent labels: 



l-h?n , !, C,ab,e even ^ec^.5 x 10« 
'-belled molecules. 

, 1 ^ d ,? ,eC,8D,e event/sec^.6 x io« 
labelled molecules 

Determined by en*yme 'amplifica- 
•on factor' and detectabiKy o 
reaction product. 

^detectable event/labelled mole- 

HZJT UU ts/,abe,,ed 



The importance of background in 
non-competitive immunoessayi 

A second important factor governing the sensitiv 
«y of non-competitive labelled-antibody ' 
munoassays is the 'background' or 'blank' si^l" 
emitted .n the absence of analvte sine T.rlT 
the measurement of this signal Vs clear" 1'Zor 
determinant of the error in measurement of™™ 



dose. Amongst contributors to th P hoM 
S'gnaJ are the 'noise' of the measurinc gr ° Und 

tion sourre^ < . extraneous rad a- 

- .^n^trs'^ ,r 

components is essential r„. • . of ,hese 
mere^rithme ic ub, rai.t XT 
absolutely no benefit ESS comex, 8r ° Un<i iS ° f 

« dependent, inter alii on h co ™'bution 
antibody used i^the vsten^ "Vn? 1?'^^ 
'ts exposure to analvtl 4- ^ duration of 

of such antibody bound to anaivte I he amounl 
mav also increase th. an a'>'te; however, it 

moietv to a grea er o onnrTT 0 bound 
cause "a n lx % rl6u^^ XOn ^ Xml ' and lhu * 
underlies the lot in^ . sens,t,v 'ty. This effect 

concent^ dep, S'S' ^ ^ 

the ^n^^^^T^^^ 
depicted in Fie 4 Th labdled antibody 

-HbodyofaVa^y=^iJ^d 
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Antibody Affinity 

2QI ^9 (Httrs/mcle) 



.IS 

Sensitivity, o 0 

.10 

(picomofes/liter) 

.05 




X 



X 



X 



Labelled Antibody Concentration 

(moles/titer) 

Figure 5. Assay sensitivity (represented by the standard 
deviation of the zero dose measurement. o 0 ). plotted as a 
n" m^ 0 ' ,n * c . once ™ ati °n of labelled antibody (of affinity 
10 UM) used in the essay, assumino different level* of 
nonspecific binding of labelled antibody."(Note: an irreducible 
nstrument background has been assumed in the compute 
twos represented; this limits the ultimate sensitivity ana n- 
able. regardless of the concentration of antibody used ) 



ower concentration is required to vield the same 
level of analyte binding, albeit 'with reduced 
non-specific binding, thus increasing assav sensi- 
tivity 

In summary, the high sensitivity of non- 
competitive labelled antibody methods derives 
essentially from their permitted use of optimal 
concentrations of antibody which (provided non- 
specific binding of labelled antibody is low) 
are generally considerably greater than in com- 
petitive methods, not from the fact that the 
antibody is labelled. Labelled antibody methods 
generally fall in sensitivity as the concentration of 
antibody is reduced towards zero, ultimately 
yielding a sensitivity theoretically identical to tha't 
of competitive methods (Rodbard and Weiss 
1973). (Paradoxically, early exponents of labelled 
antibody methods, whilst claiming them to be of 
higher sensitivity, also concluded that their 
sensitivity was increased by reduction in the 
amount of labelled antibody used (Woodhead ei 
al., 1971). This incorrect conclusion— based on 
observation of effects on the slope of the 
dose-response curve— exemplifies the many falla- 
cies encountered in the immunoassay field stem- 
ming from confusion regarding the concept of 
sensitivity discussed above.) Finally it should be 



emphasized that maximization of the sensitivity of 
a non-competitive immunoassay generally implies 
the selection of reagent concentrations and other 
experimental conditions such that the [analvte 
signal/background] ratio (i.e. sib) is maximized 
However, this simple relationship disregards' 
statistical considerations which arise when the 
numbers of detectable events are very low, and a 
more appropriate objective may, under these 
circumstances, be maximization of the ratio s 2 /b 
(Loevinger and Berman, 1951). 



Other performance characteristics of 
competitive and non-competitive 
immunoassays 

Non-competitive designs also display a number of 
other advantages deriving from the relatively high 
antibody concentrations on which they generally 
rely These include increased reaction speeds 
(and hence shorter incubation times), decreased 
vulnerability to certain environmental effects 
(which cause variations in binding affinity be- 
tween antibody and analyte), reduced sensitivity- 
dependence on high antibody binding affinity, 

Nevertheless a price has to be paid for these 
benefits; this includes the greater tendency of a 
large amount of antibody to bind molecules 
differing from, but with structural resemblance 
to, the analyte itself, implying a loss of assay 
specificity. This effect generally necessitates the 
use ; whenever possible, of an 'immunoextraction' 
procedure using a second 'capture' antibody 
(usually directed against a different binding site 
or epitope') as shown in Fig. 3(b) This 
techn.que— the 'sandwich' or 'two-site' im- 
munoassay (Wide, 1971)-thus potentially com- 
bines the twin virtues of ultra-high sensitivity and 
specificity (together with short reaction time) 
features of crucial importance in many diagnostic 
situations (for example, in the detection of AIDS 
viral antigens). (Note, however, that the loss of 
specificity inherent in non-competitive assay 
designs implies that they are less readily applic- 
able to the measurement of analytes of small 
molecular size, which cannot be simultaneously 
bound by two different antibodies directed 
against different antigenic sites on the molecule 
Such analytes are generally more appropriately 
measured using 'competitive' assay methods.) 
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• "Development of ultra-sensitive 
immunoassay methodologies 

The perception that the development of 'ultra- 
sensitive' immunoassay systems (i.e clients 
surpassing conventional RIA methods in sensitiv- 
ity) depends on (a) reliance on 'excess reagent' or 
non-competitive' assay designs; (b) the use of 
non-isotopic labels displaying higher cp ec jfjc 
activities than commonly used radioisotopes (c) 
the development of efficient separation fvMems 
(ensuring minimization of non-specific antibody 
binding, and hence of signal 'backgrounds'), and 
(d) dual or multi-antibody analvte-recoenition 
systems (exemplified by 'sandwich' or two-site 
assays) to maintain/increase assay specificity has 
formed the basis of our own laboratory's im- 
munoassay development since the early to mid 
1970s (Ekins, J978). This led us. inter alia X o t 
immediate recognition (Ekins, 1979, 3980)'of the 
importance of the in vitro techniques of mono- 
clonal antibody production pioneered bv Kohler 
and M.lstein (1975), which are currently the 
subject of bitter patent disputes in the USA 
(Ezzell, 1986, 1987a ; b), and which mav be 
expected in Europe. 

Meanwhile, of the candidate labels for use in 
this context, both chemiluminescent and fluores- 
cent labels offer many attractions. The develop- 
ment of stable, highly chemiluminescent acridi- 
niurn esters by McCapra and his colleagues 
(McCapra et al., 1977) has subsequently been 
exploited by Weeks et al (1983, 1984) and more 
recently, by several commercial kit manufactur- 
ers; other workers have used more conventional 
chemiluminescent compounds to label immuno- 

1984 1985; Barnard «*/., 3985). Yet others have 
relied on enzyme labels to catalyse chemilumi- 
nogemc (Whitehead era!., 3983) and fluoroge^c 
(Shalev e> al., 1980) reactions as indicated above 
Detailed description of these various methodolo- 
g.es is presented by others in this volume and 
need not be duplicated here. 

Common to all the 'ultra-sensitive' immuno- 
assay methodologies relying on such alternative 
abels is their dependence on a non-competitive 
labelled antibody, assay strategy whenever 
appropriate; however, for the reasons indicated 
above, competitive methods continue to be 
generally employed for the measurement of 
analytes of small molecular size (e.g. therapeutic 
drugs, steroid and thyroid hormones etc) 
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Nevertheless, the convenience (from a manufac- 

tunng viewpoint, and for other technical reasons) 

of relying on standard labelling procedures has 
mean, that; even in these caseSj «» 

rn^ qUeS , ar , e ,nC [ easing,y P referred - Though «he 
commercial kits based on these various labels 
differ to a minor extent in sensitivity, specificity 

Sn ;e n ienCe -K etC -' SUch fences JTZ 
partially attributable to differences in the ph ys - 
cochem ]Ca l characteristics of the amibodiesused 
m the kits and to other 'immunological' factors 
la™,' W,,h lh£ Panicu,ar — of t°he 
Despite the obvious attractions of chemi- 
uro.nescent techniques in an immunoa say con- 
r USe ,° f fluore «*m labels combined whh 

nH re T". ed f ° r ullra - se "sitive immunoassay as 
mdica ed above. However, more importantly 
fluorescence techniques also appeared to prov de 
a simple route to the development of -multi 
analyte assay systems of the' kind descTibed 

^n n K PUrSUanCe ° f ,his strategy, we beean 
collaboration with LKBAVallac, ca 1976-77 ^ 
he development of the instrumentation and 
technology required to develop such methods 

^eX'Uf^ °[ fluore ~- -"tees 
generally known as the lanthanide chelates 
(including, ,n particular, the chelates of euro, 
pium samarium and terbium facilitate such 
development, possessing prolonged fluorescence 

tZnZ™ t^™^ ,ar * e S ' ol <« shift 
stic, XI 3 ^h e r desirable physical character 
chean ?'? Perm " the co ™™ction of relatively 
ft i P u ?. trumen, at.on for their measurement 
(Marshall « fl /.. 1981 : Hemmila et al. 1983) T he 
fluorescent properties of the lanthan de che^es 
may be compared with those of a convemiona 
fluorophor such as fluorescein which is cha acte 

and b> n a Sma " er S,okes ^ift (' 28 nm " 

and a fluorescent decay time and emission 
spectrum which imply that i, is less 7eadS! 
fn'ffi 5 ^ fluoresce "« subsl * "«s prese 
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measured in the presence of a fluorescence 
background (deriving from extraneous sources) 
which, in practice, approaches zero. Fig. 6 
illustrates the basic concepts involved in pulsed- 
light, time-resolved, fluorescence measurement 
which form the basis of the DELF1A immunoas- 
say system currently marketed by LKBAVallac. 

Though it is inappropriate to pursue this 
subject in greater detail, attention should also be 
drawn to the possibilities offered by phase- 
resolved fluorimetry. This permits separate iden- 
tification of fluorophores differing in fluorescence 
lifetime by their exposure to a sinusoidal)}- 
modulated exciting light source, and observation 
of their demodulated, phase-shifted, light emis- 
sion (McGown and Bright, 1984). This technique 
offers the possibility both of the development' of 
homogeneous assays (relying on a difference in 
fluorescence decay time of bound and free forms 
of the fluorescent-labelled molecule), and of 
discriminating between two labelled antibodies in 
the context of multi-analyte 'ratiometric' im- 
munoassay as discussed below. 

'AM E IE NT ANALYTE* IMMUNOASSAY 

Before proceeding to a discussion of the develop- 
ment of multi-analyte assays, another important 
concept, termed 'ambient analyte immunoassay' 
(Ekins, 1983b), must first be examined. This 
term is intended to describe a type of immuno- 
assay system which, unlike unconventional 



methods, measures the analyte concentration in 
the medium to which an antibody is exposed 
being essentially independent both of samnle 
volume, and of the amount of antibody present 
This concept is illustrated in Fig. 7, and relies on 
the physicochemically-based proposition that 
when a yanishingly small' amount of antibody 
(preferably, but not essentially, coupled to a solid 
support) is exposed to an analvte-containine 
medium, the resulting (fractional) 'occupancy of 
antibody binding sites solely reflects the ambient 
analyte concentration. Clearly the binding bv 
antibody of analyte results in a depletion of the 
amount of analyte in the surrounding medium 
but provided the proportion so bound is small' 
(i.e. less than, for example, 1 % of the total) such 
Disturbance can be ignored. (This effect is closely 
analogous to that caused by the introduction of a 
thermometer into a medium possessing a much 
larger thermal capacity; the temperature disturb- 
ance caused by the thermometer itself is neeligi- 
We and can, in these circumstances, be disre- 
garded.) 

The principles of ambient analyte assay derive 
from the recognition that all immunoassays 
essentially depend upon measurement of the 
•fractional occupancy' by analyte of antibody 
binding sites following reaction of analvte with 
antibody as discussed above (Figs 3. (a) a'nd (b)) 
The fractional occupancy of ('monospecific' or 
monoclonal') antibody binding sites in the 
presence of varying analyte concentrations, plot- 
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Figure 6. Basic principles of pulse-light, time resolved 
fluorescence. Fluorescence emitt d by the fluorophor (typi- 
cally a europium chelate) is distinguish d from background 
fluorescence, which decays more rapidly. 
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n fL £ « U m, i )0d ^ COnc , emral '°n, >'s portraved 

nW, % A ■ J fr i Cl, ° n ° f anal >' te b0Und « 

plotted in this figure. (Note: for the sake of 
generality, all-concentrations in this figure are 
expressed in terms of 1/K, where K is the affinity 

5Sn!«i ° f the anlibod y- p or example, if K = 
10 UM, a i concentration of 0.1 x 1 /A' represents 

Hr 3 = 6.02 x 10 s moJecules/ml.) 

It should be particularly noted that, at antibody 
concentrations of less than ca 0.01 x l/K antibody 
fractional occupancy is essentially dependent 
solely on the analyte concentration in the 
medium, and is independent of variations in 
antibody concentration. Thi: reflects the fact that 
this concentration of antibody binds less than 
approximately 1% of the analyte in the medium 
irrespective of its concentration. This implies for 
example, that the introduction of 10. ]00 or 1000 
antibody molecules into a medium containing 
billions of analyte molecules will result in each 
case, ,n virtually identical fractional antibody 
bmding-site occupancy, the upper limit of anti- 
body concentration being determined by the 
antibody affinity constant. (An antibody concen- 
tration of 0.01 x VK is a hundred-fold'les, than 
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that (1 x 1/K) necessary to bind 50% of a 'trace' 
amount of analvte (see Fig R) rUi m ^ u V 
and Yalow Mq™ « m ' ■ ' '• c,aimed b >' Berson 
in" r \ U . max,, ni2ing assay 'sensitiv- 

wLn°' e - S l 0 ?* ° { the d ose-res P onse curie 

ana ZT£ Sed < J" ,£rmS ° f bound/f "* hbdS 
becorne' M * c ° nclus »'on has subsequently 
become .ncorporated into the mythology of 
r«d oimmunoassay design which regret ably / 
majority of kit manufacturers continw^S^ 
The ambient analvte a^av mn^. u<lw;c P l \J 

hormone immunoassay (Ekins eial igJ£ P h , ! 

^ H't W,th °K m requiHng ,he collection 3 
-aJna. However, the concept also underlies our 
approach to multi-analyte immunoassay also 
under development in our laboratory 
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Figure 8. Fractional antibody bindino-sit* ^ 
different values of analyte (antigen) c^nS^iA^i ( ? h P ' 0t,ed ss a f ^ct,on of antibody binding-siie rnn 
:oncenuai.ons are expressed in units o M/? No \i L ?f perce ™9* binding of analyte to amtodv S :^ Cent : atl0n ^ 
percentage binding of analyte is <W 0 , a ™ J^^fJ^ fo '. a ™b°dy concentrates of \es!^io^i ^ Sh ° Wn - A " 
roncentration extending over sev ral oSsTm« b ' n <? ,n 9- s,te occupancy is esseniiall U 3 e P ^ (a PP'ox.mately), 
>ther 'competitive' immunoassays are S bein 9 governed solely by |An] Hm!T£Sri- mM in aniibody 

<** * > 30%,. in ^,„ce v*T»ott « 

oiuw ve.g. Berson and Yalow. 1973), 



above 
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MULTI-ANAL YTE 'RATIOMETRIC 
IMMUNOASSAY SYSTEMS 

The concepts relating to ambient analvte im- 
munoassay and assay sensitivity outlined above 
are both explo.ted in our present development of 
a random access multi-analyte, immunoassay 
technology capable of measuring, i n the ^ 
small sample, virtually any number of individual 
analytes from selected analvte 'menus" e a 
hormone menu, viral antigen menu, an a lergen 
menu etc.). Many examples of a need to measure 
a multiplicity of different analvtes in the Tmt 
sample existin medicaI diagnosi - for j « 

the routine diagnosis of thyroid disease, where it 
.s frequently necessary lo measure a number o 
different hormones and thyroid-related proteins 
At present, chmcians frequently experience diffi" 
culty ,n deciding on the best sequence of te<t< to 
arrive at a correct diagnosis. Such problems 
would be overcome were all relevant analvS 
measurable at a cost comparable to the cost of 
measurement of a single substance. Our own 
immediate objective is the development of 
technology permitting the measurement of com- 
plete honnone profiles' using a single small blood 
sample. However, the need for 'multi-analvte' or 
random access' measurement is not confined w 
medical diagnosis: it also arises, for example n 
the pharmaceut.cal industry (where there exists a 
requirement to ensure the purity of proteinics 
synthesized by recombinant DN A techniques f n 
the food mdustry and elsewhere. Though still at 
an early stage, our approach to the achievemen 
of this objective can be briefly indicated. 

Multi-analyte assay: general principles 

As discussed above, the notion of ambient 
analyte assay simultaneously introduces two 
extremely important and novel concepts: (a) that 
an estimate of analyte concentration can be based 

w,i ■ • "'I °1 a " infinilesi ™l amount of 
sampling antibody, and (b) that such an estimate 

an , k h ° m 3 d ' reCt measu 'ement of fractional 
antibody occupancy by analyte, irrespective of 
the exact amount of antibody used. It should be 
emphasized that the latter proposition is valid 
only m the context of ambient analyte assay and 
« not true in current conventional immunoassay 
systems (in which fractional antibody occupancy 
depends both upon the amount of antibody Txhl 
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system, and sample volume—see Ro s\ i u 
exposure of a small number of ^mibodv mo°.e' 
cules (,n the form, for example, of 
located on a solid support) to an ' IT 
containing fluid results inT • pancy of an t,& 
binding sites in the microspot rlfiecZ ! ?£ 
analyte concentration in the medium F oI |L Jp 
such exposure, the antibodv-bearinT '0^* £1 
be removed and exposed to TVe™ IodS 
solu^n containing a high concentration cS an 
appropriate second antibodv dirm.H V 

antibodv simulating antigen fnrf" ( 
unoccupied bindi/g ^ * 

mirror-.mage ant.-idiotypic antibody'- the use of 
such an am , body inst£ad of y . the us f 

convenien but not essential, and is suggested 
here merely ,0 simplify illustration of the basic 
concepts involved.) asic 
Subsequently, an estimate of binding-site occu- 

locaSdln the SamP,ing ' (S °' id P*«> 
located m the microspot mav be derived hv 

Z as ZH l °V he ra,i0 of s * nals bv & 

mo antibodies forming the dual-antibody •coin! 
k ir l Can be c °nveniently achieved bv 

£?■ " 8 - ^ Samp,in S' 'developing' and. 
bodies with d.fferent labels, for example a nairrf 
radioactive, enzyme or chemHuminScem'TS? 

lar v Ie U f°; eSCe K nt ,abC,S ar£ neve »heless part7c U . 
lar y useful m th.s context because, by the use of 
optical scanning techniques, they permit arrays of 
different antibody 'microspots' distributed^? 
surface, each directed against a different analyte 
to be indiv,dually examined, thus enahlino 
mu ,,p|e assays to be simultaneously carried ou ? 
on the same small sample. Fig. 9 illustrates these 
basic ideas, and Fig. 10 such an array 



Microspot immunoassay sensitivity- 
theoretical considerations 

The notion that it is, in principle possible to 
measure an analyte concentration u S n g Micros 
an,,bod >' comprising a number oVam body 
molecules m the range ca lO'-lO" is likely, iffe? 
sight to appear surprising, and mav indelS 
provoke scepticism regarding the as«^sensin v t 
J.es potentially attainable using th"a PP ?o ach 
Clearly a number of factors, such as the Sitify 
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Figure 9. Basic principle of dual-label ambient-an?ivt e 

and f, fluorescent photons .mined reflect' h ^Kefr ^'T 9 ^ flUOreSCen ' labelle ^"tibodies Thereof a 

support, etc., are likely to D | av 1 
determining final assay sensitivi^Such'facto'rs 
ar K e \'Y Urn ' de P end ^ on the cffidencv with 
which the particular labels used can be detect 
the adsor P t,on properties of antibody supports 
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igure 10. Multi-analyte' antibody arrav Parh ^ 
mcrospof represents a vanishmgiy 7rna«' Imn 
itibooy directed against an individual anal™ ° f 



sensiliv„,« likely ,o be achieved on the barfs o 
ome s, mp ,e , h e orelical calcula.ionl To c bri?' 

F'g- 11 illustrates the surface n f * . u _, 
microspot, of surface area Au^ a " antibody 
iv) coaled wi, h ™^£$J-Xvtt°™' 
mono rno.ecu.ar .aver of densit '^(^e uTes/ 

he system is thus given by ADlv. (Note- the fact 
that antibody is situated on the surface of a soiS 

h U PP° rI :. and " ot e ^nly distributed throughom 
the medium^ does not affect the extent of ana,v"e 

Z ^ amLdf K 0 r VnamiC assuming 
hat antibody binding sites are not impeded in 
he,r reacuons and have not been damaged during 

me coating process.) 

h^ e3 , n ^ hile ' fractiona ' occupancy (F) of anti- 
body btndtng sttes by analyte (at equilibrium? is 
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*> Surface area * - 
y'.y A sq.fim \<< 

;.Antlbody density 

;VntlbodV affinity/ 
* * * KUM 



Avogadro's number: 

N molecules/M 



Figure 11. Microspot smbient-enelyte immuno?cc av Th e rr,v 

though if the duaMabelled a«iJS^^^^« «° be uniform^ coated with antibody 

m.n.momflu.d volume for a mbien, analyte essay cXtansto^ C0S ' m5 i£ n ° l eSSential Th « 

Mm.mum test sample volume (M/S): A x D x KxvSn 9 P ° the ratiomeui c approach) is shown 



given by the equation: 

F 2 - F(l/q + pl q + j) +plq = Q (]) 

where p = analyte concentration, q = antibody 
concentration (both expressed in units of \IK) 

Thus, for antibody binding site concentrations 
-» 0 (i.e. q < 0.01), F ~ p/(l + p) - (see Fj g) 

Likewise the fraction of analvte bound bv 
antibody (/) at equilibrium is given by the equation^ 

f 2 ~ f(Vp + qlp +l)+q/ p = 0 (2) 

Thus, for analyte concentration -> 0 fi e n <r 

°-! 1) ' fm S'S + 9)5 (SCe F * 8 >- ^nermore 
when q < 0.01, and when p 7z 0, / < 0 01 

Expressed in units of UK; the concentration (q) 

in the assay of 'sensing' antibody situated on the 

microspot is given by DAK/(v x 6 x 10 20 ) ( s j nce 

Avogadro's constant, expressed as the number of 

molecules/rnmol, ,s 6 x 10 20 (approximately)). 

The fraction of an analyte concentration -* 0 

which will be bound to the spot is therefore 

DAK/( V x 6 x ]0» + DAK), implying tSat the 

number of analyte molecules bound to the sdoi is 

given by vCDAK/(v x 6 x 10 20 + DAK). 

Case 1: sandwich (two-site) assay. Following 
incubation of sample with antibody, we assume 
the sample is removed, and the microspot then 
exposed to a volume V(m\) of a solution of a 

S A COn /? , /i^ e,,ed ' '^doping* antibody of affinity 
a (UM) at a concentration given bv O 
(expressed in units of \IK'). 



The fraction of analyte bound by labelled 
antibody (f>) at equilibrium is given by he 
equation: 3 

F* 2 - r{\IP + qip + 1} + QIP = 0 (3) 

where P represents the analyte concentration in 
the deyelop.ng-antibody solution, expressed in 
units of \IK\ i.e. vCDAKK'/Uv x 6 x 10 20 + 
DAK)V x 6 x ]0 20 ]. 

Assuming P < 0.01, F* ~ Ql{\ + Q). (For 
example, if Q = i, the fraction of analyte 
molecules bound by labelled antibody = 05 
approx.mately). Thus, since the number of 
analyte molecules bound to the spot is given bv 
vCDAKIiv x 6 x 10 2 " + DAK),x he number of 
analyte molecules labelled by the second de- 
veloping, antibody is given by vCDAKQ/Hv x 6 
* 10 2 " + DAK){\ + Q)), and the surface density 
of such molecules is given by vCDKQIUv x 6 x 
10 20 + DA K) (1 + 0)1. Moreover, assuming that 
DAK < v x 6 x 10* (i.e. that the amount of 
antibody in the system is such that 'ambient assay' 
conditions prevail, then the surface density (£>*) 
of developing-antibody molecules = CDKQI[(i 
x 10- )(1 + Q)\ approximately. It should be 
noted that D* is independent of both v and V 
also that the ratio D'/D = C x KQ/[(6 x 10*°)(1 
+ Q )] = C x constant. 

If the minimum detectable surface density of 
developing-antibody molecules (i.e. O/rf, the 
standard deviation of the measurement of D* 
when C = 0) is given by D; in (molecules/urn 2 ) 
ana <-min represents the minimum detectable 
analyte concentration in the test sample, then 
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disregarding non-specific binding of developing 
antibody wjthin the microspot area, 
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Cnin = DU x [(6 x ]0 20 )(l + OJ/DA'g (4) 

r ° '"fnf^ = 20 mo, «u>es/un£ then 
t-min = 2.4 x ]0° molecules/ml = 10~ ,5 M/L It 
should be noted, in this example, the fractional 
occupancy of the sensing antibody binding sites 
by the minimum detectable analyte concentration 
is 0.04%. 

Case 2: anti-idiotypic antibody ('competitive') 
assay, n this case, we assume that, following 
removal of the sample, the microspot is exposed 
to a volume V{m\) of a solution of (for example) a 
second, labelled, anti-idiotypic antibody reacting 
•vith unoccupied sites on the sensing antibodv 
Jsing similar reasoning as above, we ma'v 
ikewise assume that the fraction of such <ites 
vhich become occupied by the anti-idiotvpic 
developing' antibody is given bv Ql(\ + 
'here Q is the developing-antibody concentra- 
lon. However, the minimum detectable surface 
ensity of anti-idiotypic antibody is not in a 
ompetitive design, the critical determinant of 
ssay sensitivity; this parameter is essentially 
overned by the precision of the density measure- 
lent. 

From Eq. (1), the fraction of sites unoccupied 
v analyte = 1/(1 + ,), and the fraction 

/ anti-idiotypic antibody = £?/(] + p \n + % 

nus, if the CV in the measurement of ant 1 

vFl c %\ A i U' the standard deviatio " » 

?/(! + p){\ + Q). This term also represents the 
> m the estimate of the fraction of sites occupied 
analyte. Since the total number of antibody 
iding sites in the spot is DA, the SD in the 
•mate of occupied sites as p -» 0 fi e oD.T) 
proximates tDAQl(\ + Q). the SD j n the 

o/n + m cA nsity estimate is th « 

Ql{\ + Q). But the SD in the measurement of 

ctional binding-site occupancy when p — 0 
ines D min , and hence the minimum detectable 
Jlyte concentration in the test sample as 
icated in Eq (4). r 
Tius 



For example, if values of Q = i n -. in s 
molecules/urn 2 , and K = 10" UM are'assumed as 
in the non-competitive example considered 
above and the CV in the measurement of 
ant.-,d,otyp,c antibody density in the microspot is 
1% (..e. e = 0.01), then D mi = 500 molecules/ 
fo-3M/, £ min = , 6 x ]0 * molecules/ml = 
antihndv k- * act,onal u °*upancy of the sensing 
annbody binding s.tes by the minimum detectablf 

should be noted that the sensitivity limi of tlK 
(expressed in molar terms) is idem 2 1 to that 
prey.ously established for conventional 4ompeti 
tive' assays (Ekins and Newman 1970) ^ 
which underlies the predictions represented in 

Such considerations appear to suggest (a) that 
microspot assay sensitivities superio? to hose 
ob arable by conventional radioiso op cal J 
based immunoassays are achievable, and (M That 

s n a S vra l re S .S ,d f £ ^"Pe** micro spo 
assays are likely to be considerably greater than 

savs ° t COrr rr di "S com P^-tive 8 microspot 
iE T St b !- em P has i"d, however, that 
though such predictions are likely to prove 

o^;^e^a a b s e^ pti H ns ■ reg f din s the ^ussz 

of the labels and s Ig nal-measuring instrument 
used are mcorporated in the simple heoS 
analysis d.scussed above. Such factors arTckaSj 
of .mportance in determining overall microspo? 
immunoassay performance. microspot 



. = -Dmin X [(6 X 10 20 )(1 + Q)), DKQ (5) 

= tDQI{\ + Q) ± [(6 x 10 20 )(l + Q)] 

DKQ (6) 

= e/tf x (6 x 10 20 ) (7) 



Practical implementation 

The concepts discussed above are clearly exploit- 
able us.ng a variety of antibodv labels includil 
chemiluminescen, .abe.s; howe've our pre^m n f 
ary stud.es have been based on the Sen 

IT^uH^ nu0r °P hores ' «nce the techno ogy 
of simultaneous measurement of dual fluoresc 

Bectus^hir: 1 'I 3 "" " a ' ready We " estab 'S 
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ence microscopy, a small area of the specimen is 
illuminated by a focused laser beam; the fluoresc- 
ence photons emanating solely from this area are, 
in turn, focused onto a photon detector. Both the 
intensity of illumination and the efficiency of light 
collection diminish rapidly with distance from the 
focal plane (Fig. 12). At the 'confocal' point, the 
projection of the illumination pinhole and the 
back-projection of the detector pinhole coincide. 
Such systems contrast with conventional epi- 
fluorescence methods, where the specimen is 
exposed to an essentially uniform flux of illumina- 
tion (White etal., 1987). 



Sensitivity of current instruments. Tvpically 
fluorescence photons emanating from the laser- 




Objeciivi 



Object 

in Focal Plane 
not in Focal Plane 



Figure 12. Pr.nc.ple of the confocal microscope. Illuminating 
hgnt is focused at a point in the focal plane. Reflected light 
from this point is focused onto a detector. A complete 
two-dimensional image of structures within the focal plane is 
obtained by scanning the selected area of interest and may 
be stored in a microcomputer for video display 



illuminated area are detected by a I w dark 
current photomultiplier Electron/Qrvw* . 
emitted by the p'hotomuhfpUe" 8 KSSS 
contnbute to the background signal * the 
instrument and must, for highest sfns itivTty be 
rmmrrnzed. Fortunately the overall design of such 

cthode to be of very small area, so that thfc 
particular source of background noise is not onK 
small, but can be expected to reduce in re at"ve 
nnponance w,th future improvement in photo! 

X S v H- de , S ' gn - Meanwh ^ ""rent instruments 
already display very high sensitivity of detection 
of fluorescent signals. For example, the confocal 
microscope manufactured by Zeiss is claimed to 
display a lower detection limit for fluorescein of 
about ten molecules/urn* (Ploem, 1986). Most 
commercially available FITC-labelled IgG attains 
a fluorophore/protein molar ratio of ~4- thus the 

S?Src ;lh (Z ?h n) ZdsS -'cro^opeh 

2-3 FITC-labelled IgG molecules/urn 2 . This 

o7~2 S 4 X 10" a rnn,; CO, ; Ce / nl ; a ; iC,n deteCti0 " ,imit 
oi ia x ]0 molecules/ml for a two-site assav 

assuming the same parameter values as used \n 

the examples discussed above, or 24 x iff 

i$ e L C /M es/ml using 3 ,sensing ' antibody of affinit y 

R a H/ f l 0the \ COmp . arab,e instr "ment is the Bio- 

scooe which^ l3Ser SCanning COnfocal 
scope, wh.ch we are currently using in the 

™ ST?'" 1 ° f ratiometric ' muhi-analyte assay 
me hodology ,n accordance with the principles 
outlmed above (see Fig. 13). The argoE laseHn 
this system possesses two excitation lines at 488 
and 514 nm. It ,s thus particularly efficient for the 

s e u\ c n tai a rFrrc bl r g ^ n , em , itting 

such as FITC (which displays an excitation 
maximum at 492 nm). However,' it is conside?^ 
less efficient ,n the excitation of red-emitting 
fluorophores such as Texas red (excitation max 
-mum 596 nm). However, the VatioScTm 
munoassay principle permits considerable varia- 
nt , n detection efficiencies of the two labels 
relied on since, inter alia, the specific activities of 

amib^° ' abe, ! ed amib ° dy S p' cies Arming he 
anubody couplets can be chosen to yield optimal 
signal ranos ,n the region of unity Thus 

em n C,en n y ° f ?* 3rg0n ,aser in «cLg red 
emitting fluorophores is not necessarily a major 
hand.cap ,n the present context. J 

Though the current Lasersharp instrument 
relies on a conventional microscope rather th™ a 
purpose-designed optical system (and appearl to 
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shape and size for optimum adjusimem to th? 
specimen structure. More generally tht ? . 

here. br0ad P ri ™ples presented 



Beam 
Splitters 



Objective 



Antibody array 



Antibody microspet 



R9ure 13. Dual-channel confocal fluoresce:** mi™, 
permmmg simultaneous measurement of thffi.T. SC ° Pe 
s-gna.s from two f.uorophors situa™ | « ^the foca, ^of "p" 
scanning the antibody array the ratin «f ? ■ P 0 "*- By 
antibody microspot r^^niSi ^ U °™ each 

be less sensitive), it permits quantification „f 

seSTa^eVt^ f ™ ™X * Q ° 

selected area. Initial studies have revealed that 

under conditions that are not necessarify optir Sa ' 
the instrument is capable of deter-fina ° pt,mal ' 
imately twenty-five FlVlab^ 
Mm , scannmg an area of -50^ flS JS h 
must be stressed that neither of thise confL 
mjcroscopes are designed specificallv for m,!? 
r,t ometric multi-analyte rfmSSS^S 
>t can be anticipated that future i ns tml n ,? 
constructed specifically for this pumo « ar Kk? v 
to prove both cheaper and more sTnshive ^ 

)ther instruments. The MPM 200 Microscooe 
>hotometer manufactured by Zeiss 0 mS£? 



high signal-to-noise r Z anH B ™u' y,eld 
immunological activity (Fig. 16) . 13,0 
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Figure 14. Fluorescence signal (arbitrarv unit** ^ 
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Figure 16. Surface density of immunoreactive IgG molecules (number of molecules^m 2 ) plotted as a function of the total 
surfac density of IgG {number of molecules/jim 2 ) on Dynatech Microfluor white microtitre plates 



thought it useful to confirm the validity of our 
general concepts by comparing the performance 
of certain assays when constructed in microspot 
format and when conventionally designed. For 
example, we have compared a dual-labelled 
tumour necrosis factor (TNF) ratiometric assay 
system using Texas red and FITC-labelled anti- 
bodies with an optimized IRMA system using 
identical antibodies but with the second antibody 
l25 ]-labelled. Although unoptimized, the 
ratiometric microspot assay yielded formal sensi- 
tivity values closely approaching that of the 
conventional, optimized, IRMA. Although 
verifying the general concepts underlying 
ratiometric microspot immunoassay methodolo- 
gy, further work is required to achieve the 
considerably greater sensitivity that theory pre- 
dicts as achievable using optimized reagent 
concentrations and improved instrumentation. 



CONCLUSION 

As indicated above, differentiation of the fluores- 
cent signals yielded by two fluorophores can be 
readily achieved solely on the basis of wavelength 
differences, and this approach has been relied on 
entirely in our preliminary studies. However, 



other physical techniques exploiting differences in 
decay time of two or more fluorescence emissions 
(using, for example, a pulsed or sinusoidally 
modulated laser source, and time- or phase- 
resolving detectors) are available, and can be 
expected both to further reduce background and 
to improve signal resolution, thus increasing assay 
sensitivity and precision. These considerations 
aside, the basic technology involved closely 
resembles that employed in domestic compact 
disk recorders and other similar data-storage 
devices, the obvious difference being that light 
emitted from each of the discrete zones forming 
the antibody-array is fluorescent rather than 
reflected, and yields chemical rather than physical 
information. Indeed, our preliminary studies 
suggest that highly sensitive immunoassays using 
antibody microspots of surface area approximat- 
ing 50 urn 2 are achievable, implying that some 
2,000.000 different immunoassays could, in prin- 
ciple, be accommodated on a surface area of 
1 cm . Though non-specific binding of a multiplic- 
ity of developing antibodies would probably 
prohibit the use of antibody arrays of this order, it 
is evident that the technology is capable of 
encompassing analyte numbers of the kind likely 
to be useful in practice. 

The development of multi-analyte assay sys- 
tems of this kind can be anticipated to bring about 
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fundamental changes in medical diagnosis and 
many other biologically related areas. Systems 
capable of measuring every hormone and other 
endocrinologically related substance within a 
single small sample of blood are within technolo- 
gical reach, providing data which, when analysed 
with the aid of computer-based 'expert' pattern- 
recognition systems, are likely to reveal endoc- 
rine deficiences only dimly perceived using 
current 'single-analyte' diagnostic procedures 
Such systems also provide a means to the 
development of a 'random access' immunoassay 
methodology, permitting the selection of any 
desired test or combination of tests from an 
extensive analyte menu. Clearly the accommoda- 
tion of a wide range of individual immunoassays 
on a small immunoprobe (comparable in its 
overall physical dimensions with a few drops of 
blood) is likely to totally transform the logistics of 
immunodiagnostic testing, and genuinely repre- 
sents, in our view, 'next generation' immunoassay 
methodology. 
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Throughout the 1970s, controversy centered both on inv 
^S^^^ P» r 86 ^ on the relative sens- 
titles of bbeled antibody (Ab) and labeled anaJyte mettv 
ods. Our theoretical studies revealed that RIA sensitivities 
could be surpassed only by the use of very high-specific- 
actrvrty nonisotopic labels in ••noncompetitive- designs 
pr ferably with monoclonal antibodies. The time-resorved 
fluorescence methodology known as delfia— developed in 
collation with LKBArVallao-represented the first wrr," 
rneraal ultrasensitive- nonisotopic technique based on 
these theoretical insights, the same concepts being sub- 
sequent* adopted in comparable methodologies relying 
on the use of cnemiluminescent and enzyme labels How- 
ever, h.gh-sperjffic.aaivfty labels also permit the develop- 
ment of "multianalyte" immunoassay systems combining 
ultrasensitive with the simultaneous measurement of tens 
^SH^Z tousands of anaiytes in a small biological 

^Z^J^SP re,ies on s,mp,e - alt *« «2 " 

unexptorted, physicochemical concepts. The first is that all 
immunoassays rely on the measurement of Ab occupancy 
by anar/te. The second is that, provided the Ab concentra- 
ton used is "vanishing* small." fractional Ab occupancy is 
"dependent of both Ab concentration and sample volume 
This toads to the notion of "ratometric" immunoassay* 
involving measurement of the ratio of signals (e.g., fluores 

sensor- Ab) deposited as a mtorospot on a solid support, 
the second (a "developing" Ab) directed against e5£ 
occupied or unoccupied binding sites of the sensor Ab oSr 
prelirn^ary studies of this approach have Ton a 
dual-channel scanning-laser confocal microscope, permit 
fng m,cfospots of area 100 or less to bJ5E2 
^hT&! ha,an ^ * 108 Attaining microspots,' 

le ^ bt ,r a^X?J dWeren! ana,y,e ' in 
pie, be accommodated on an area of 1 cm 2 . Althouqh 

measurement of such anaJyte numbers is unlikely ever to 
berequwd, the ability to analyze biological fluids fora wide 
spectrum of anaiytes is likely to transform immunodiagnos- 
tas in the next decade. 



Additional Keyphra.ee: rstiomebk Immunoassays . scanrun*. 

^Immunoassay and other protein-binding assay metfa- 
ods based on the use of radioisotopic labels have played 
a major role in medicine during the past three decades 



£lf ? T^**** derived primarily 

S^f 11 ^ Sperifidt y * ™*y reactions be. 
t^een binding protems and anaiytes and the detectabfl. 
ity^of isotopically labeled reagents, the latter endowing 
such techniques with "exquisite sensitivity » Whf 

topic t«hniques based I on identical analytical princt 
pies, differing only in the nature of the marker used to 

1 B IS^ re ? CtaDt (e ' g - or antigen), whose 

d^bubon between reacted ("bound") and un^ac£ 
' lrartlonB constitutes the assay "response " 
The basic aims underlying this interest can be 
broadly classed under four main headings- 

•avoidance of the environmental, legal, economic, and 
f/ x ^ r Mtafe8 of Atopic techniques (e.g., lim- 
!?! if* 180t °Pi call y labeled reagents, problems 
of radioactive waste disposal, cost and complexity 
radioisotope counting equipment), particularly those 
nytog the development of, for example, simple dia£ 
nostic kits /or home or doctor's office use- ^ 
• achievement of greater assay sensitivity; 

use * 

h .this presentation I will focus primarily on the last 

Z£Z 3 ^ thi8 to 861 ■* the principS 

iTtSrS T Pf** att «npts to develop a new "min- 
iaturized technology that will permit the simultaneous 
measurement of an unlimited number of analytee in • 
small biological sample such as a single drop ofblooi 
However, retention (and. if possible, improv^t)of 
the high sensitivities of conventional isotonic tech- 
niques is a basic aim not only of our own studies in this 
area but also of most other endeavors falling under the 
above headings. It is therefore appropriat/to prefab 
this paper with a discussion of the general principles 
underlying the attainment of high binding-assay sens? 
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immunoassay Sensitivity: Some Basic Concepts 
Definition of Assay Sensitivity 

JS* De ^ to J 8t * b 1 lifih "aay conditions yielding max- 
unal sensitivity underlay the independent conrtniction 
of mathematica] theories of immunoassay oSgTS 
both Yalow and Berson (J) and Ekins et al. (2) uTthe 
course of the origkaJ development of these meicSst 
lie early 1960s. Regrettably, these theoretical studies 

Itl vm- 9 ngCd controve «y. *ri«ing largely from th 
conflicting concepts of "sensitivity" adopted by th^two 
groups (see Figure 1). Briefly, Berson and Yalow to 
their many publications relating to in^uncWd" 
sign (e.g., 1, 3), denned sensitivity as the slot* ofth» 
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SilJ^S'^K^ TSJft. P^ 8 " denying 
Barton (eug.. l. 3) and (np«) Eking et al. (£ 4) 
Yjtew and Beroon (Mint sssay A a* more sensitivo because it yields . 

Berson CkMna define an assay system u more pntcist « » . -—J? 
reaponae cuiva whan data an plotted on a log dcaescale Bywsas,ae P* f 

response curve relating the fraction or percentage of 
labeled antigen bound (b) to analyte concentration ([H]) 
In contrast, Ekina et al. (e*., 2, 4) denned sensitivity as 
the (imprecision of measurement of zero dose this 
quantity being indicative of, and essentially equivalent 
to, the lower limit of detection. 

Hie key difference between these two definitions 
clearly lies in the dependence of the assay detection 
limit on the error (imprecision) in the measurement of 
the response variable. By neglecting this crucial factor 
the "response curve slope" definition leads to many 
obvious absurdities. For example, plotting conventional 
RIA data is terms of the response metameter B/F (i e 
the bound to free ratio) suggests that assay "sensitivity* 1 
is increased by increasing the antibody concentration in 
the system; however, the converse conclusion i B reached 
if identical data are plotted in terms of F/B'(see Figure 
2). Observation of the shape and slopes of response 
curves without detailed error analysis thus constitutes a 
totally misleading guide to optimal immunoassay de- 
sign. This approach has, however, characterized many 
of the studies conducted in the immunoassay field dur- 
ing the past 30 years, and has been the source of much 
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mythology. For example, consideration of the Law of 
^ Act^ reveals that, when responae curved 
spending to different antibody concentrations a«p£t! 
ted u, ten* .of b vs BQ, the maximal slope atserodoae 
is obtained for a concentration of 0J5K (where IT is the 
affinity constant), in which circumstance the zero dose 
response (ha) is 33%. This conclusion led to Berson and 
YalcVs enunciation of the well-known dictum (which 
albeit erroneous, is broadly adhered to by many immu- 
noassay practitioners and kit manufacturers) that, to 
maximize RIA sensitivity, the amount of antibody to use 
in the system is that whi ch binds 33% of labeled antigen 
in the absence of unlabeled antigen (J, 8). 
■ ^E^nt regarding the concept of sensitivity 
inevitably led to prolonged dispute regarding immu- 
noassay design (5). However, although it is still common 
to encounter publications in the field that rely solely on 
the response curve slope as a measure of sensitivity, the 
assay detection limit is now widely accepted aa the nly 
valid indicator of this parameter, and we do not there- 
fore intend to dwell further on this issue here. It is 
nevertheless relevant to an understanding of the "min- 
laturized" assay methodology described below to empha- 
size that untenable concepts of both sensitivity and 
precision underlie many of the commonly accepted rules 
governing current immunoassay-design practice, some 
of which are contravened in our own approach. 




Response curve slope Detection limit 



Op. 2. Schematic representation of RIA dose-response curves 
T* !?' H?-? "f, 0 !™^ v***™**™ Plotted tn terms of 

m «w freeTbound fraction (F/B); (anter) the bound/free fraction 

(B/F) 

Note thai the low antibody concentration yields a response curve of oreatar 
•tope when no stay response Is plotted t. terms of ^buVrt towo? ZL 

<£Oo) ie Independent d the coordinate frame wed to plot assay data (lee 



Basic Immunoassay Designs 

It is likewise important in the present context to 
comprehend the basis of the various types of immunoas- 
says currently in use, and the constraints on the send, 
tivities of which they are potentially capable. The radio- 
immunoassay and analogous protein-binding assay 
techniques originally developed for the measurement of 
insulin by Yalow and Berson (6), and of thyroxin and 
vjtamin B 12 by Ekins and Barakat (7, 8), relied on the 
use of a labeled analyte marker to reveal the products of 
the binding reactions between analyte and binder (Fig- 
ure 3, left). This approach has subsequently often been 
portrayed as relying on "competition" between labeled 
and unlabeled analyte molecules for a limited number of 
protein-binding sites, euch assays being frequently re- 
ferred to as competitive." 

Subsequently, Wide et al. in Sweden (9), followed 

IK'S y JS' * UJC (10), developed 

kbeled antibody methods (Figure 3, right). These meth- 
ods represented an extension of the Hebeled reagent" 
me S^ S J"? 1 ™* radiolabeled organic compounds such 
as I-labeledn-iodosulfonyl chloride, ( 3 H]acetic anhy- 
dnde, and other similar reagents) devised, during the 
early 1950s, by Keston et al. (11), Avivi et al. (12), and 
others for quantifying amino acids, steroid and thyroid 
hormones, etc. Although radiolabeled antibody methods 
(immunoradiometric assays; uuus) were originally 
claimed (13) to be more sensitive than methods baaed on 
the use of radiolabeled analyte, these claims were sup- 
ported by neither rigorous theoretical analysis nor per. 
suasive experimental evidence, and for some timeTe- 
mained controversial. Further doubt on their validity 
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fig. 3- Labeled-anaJyte (te/n and labetod-ansh**. 
systems compared "aoeieo-antibody (nctt) assay 

l*^d*MlK«« *>My system* essentia^ ~iv_^„ . 

to rev**) n» produce tiEn^^^™** * "» »"*Me 

•» optimal antbooy cenannjbon l%a^^^^ b ^ n ^6on. 

PwJua. «( the bfndne •■MfcntaSLZ,*! ^■A*'" to reveal o, e 

»••* aero when the W-ewfbo^TJSE 5^* XJnBZ ' ton* 
WWy when the bound frsa^ ^,^T^^' ^ tantJ « 
b«ekp>ouno) oeiermined (Mcewise assuming zero 
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was cast by the publication by Rodbard ™* vu ■ 
1973 U4) of detailed theoretical SSeTd^J, " 
tkat both labeled analyte aJhiT^Zl^?* 
Poised essentially ^JS^^fc^ 
authors suggested that irmas might tem^ri^J^ 
the assay of small polyt-ntiS T7 ? ^■"■"jvem 
^corporation into ti^lS^^ 
conversely, these assays would be less 

ertheless, despite the appearance of tniTr^iT!? * 

belief that labeled aB^e^^^^ 

«cally more sensitive than the a*J*L!S> . 

anaJytemetbcxbgabed™ea^^T ndmg ^led 
chemists, acceptance among clinical 

He reason for confusion on this issue i» thof *k 
greater potential sensitivity of certain 
not really a consequence oftSTaS^LS " 
opposed to analyte: indeed th« 7,^ of « mt,bod y «* 
between labded-amuyte and laMeTS 
A verts attention ^ e ^ e ^r^^^f 1 
superior »ensitivityofc^ a ^^ d ^^ 
analysis (see, e.g., <. 25) revT^TSsum^tf 1 

rr " para ? on ? the pnrfucto * * *3K2e 

u.e. f no nusclassifi cation ofbnimd o^j ^* reaction 

optimal antibody -Sfiff^.JS? 
i^malabeledandytei^oity^ 

analyte fraction is measured, where*. tJuwj j . 
body methods the optima^HbS^T I * eW -«ti- 
fends on which labeK^^^^^ 00 ^ 
(see Figure 8) . If the free (vJZZ$£££ SUT^ 
measured, the optimal cm^o^S^" 1 U 
^versely tf the analyte-bound ta£ t£2£7 
toe concentration tends to infinity In short nf*^r ' 
basic measurement strategies av^lawS^ 6 ^ 
lyte, with measurement of free tr k! T~ 6(1 afla " 
act, and labeled Jb* a£ ^tLT° n pn * 
free or bound product^nly ^r^^T™™* of 



emphasised that wchTsSS* * *™tbe 

from the original meanfcg? a^T£? * d * artB * 
and ^onampetitive-^efth^ de^i^ ,Dpeti ^ € " 
"wd in the present context T^ll 1?°** Were 6nt 
assay, ma/be suWa^^^J^^ 
labeled reagent of any kinfis" t&iT* ^ D ° 

maximaJ assay sen^^SoW^S^ 
rng reasons for the eastenc* ofrt^T^ ^ underly- 
desig^, and may ~£E£rK^ 
may be more readily underfed S^^rTJZ^ 
such assays are nartru^ a " ~ pnnaplea of 

the binding reVctioZ a ^!uTT C ° natent ^veming 
analyte pre^ntt^e n^^T^^ 0 ^ 
immediately from the J^^'J^l np0e $° n 8tea » 
written as' ° fMaM Acbon . which can be 



rAbAgW£Ab]«JniAgJ (1) 
give" 0?^°^ 0CCUpM1Cy ° f ^Wy binding sites, 

(AbAgMAbJ = K[£AgV(i + K [£Ag]) (2) 
where [AbAg], [Ab], [fAb], and [fA^l renr™^ *u 

tration of free analyte *en«^£L ? ^ CQnce °- 

t^tions of both teS S^H^ 

when tetel antibody ^ SSK 

and total anfagen ([Ag]) concentrations ^do ItT'JZ* 

Mgruficantly, and fractional cccu^n^f S. ^ 

given by occupancy of antibody u 

(AbAgMAb] = K[A«]/(1 + KfAgJ) 



(3) 

concentration (see below). 8Bd ^bbody 

n^nt of the jfracbonal occupancy- of the senaoTa^ 

antibody bm^t^^rT^ 

attainment of mS^^ 0 !^ 1 ;^ 
anybody ^cent,^. ^X^: ^ ^ 
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may therefore be categorized « B « 

pemft (ta principle) the^o^iaSetS 

sure a small ouantitv k„ -T ™aeairable to mea- 

J£* " nCept8 «« iUu-^ted in Figure 6 which 
Portrays basic immunoassay formats cu^i 
men use. Conventional WAanrfn^f m com - 

^yte-tecbniques^vT^ other ^ar labeled- 
binding sites, S^SS^ 
fjaeous or sequential) ^Sh kbSera^^r 111 - 

typic antibody (reactive oSy1Stn^£ ^ 
on the sensor antibody) ma/be us^^" 1 
Purpose. In the case of single^ latefJl^^' 
»aya, the labeled antibody haStJ^rZT^ 0 ^ as " 
antibody; after reaction w^^L^^"^ 

-y can St ^S^Cotet i ^ 

assay is "competitive - UnmUn ° aorbaat >. then the 

Two-dte "sandwich" assavs a«, i 
becauae they rely on S^"™* 1 " 
ered from two points of view FW™,/^ con£id - 

classed as rnoncompetiti ve » 8May8 ^ b < 

mSsW^^r emphasi2e the differences 




"coMprnnve nmuNOAss«r- 



♦ 



*-y in which anSST"^ Tvnunouuy, 

component (if any) of th*> r* a ^ 
Indeed, in the JZitSSSS^™** 
eors." no component is labeW- J^T ™ munoBa »- 
of the immunoLnBor JSiZ * n . everthe lass. the design 

on whetherTmSfe^r^ ft de ^ 
or unoccupied antibody bS^tS^PW 
surface. In short the i/*™* « situated on its 

petitive" merely reflet ""P*** *»* "aoncom. 
detennination of a ^t^V^"*" to *• 
"tee and lead J SftSSLT^ "S^ *"""» 

dom errors arising in the dSSEfli 

Competitive and noneomnetitivrS, 
be shown to differ sinS^ ««nunoassay, can 
mance AuuSSJSEiZ^ T* y of their perfor. 
both types of SSL^S^?* ^vities. In 

in determining sensi^ J££ In^r^T 
theft «nsitivityofcompetitiv.«.«l„ • ' D P ractl «. 
* the a ^ -tatHrthTS^ 
specific actjvity of the label is moreSr^tT^ 

competitive systems. In both ca^thf?^f U !°t 
or "manipulation" error in th?t cxpenmental' 
zenniose response SSi ? <* the 



zertHiose response (Rj n I SI of the 
arising from^ipettiS SSSl r" 
-luding the statistical signal me^^ ^ 



FROM BIOMEDICAL INF JURATION SERVICE 



— - JWED). 2. t-B." 09 , , 

n:29/ST - ,l: "^«« 2209286 p] 



•e] is of key importance is determining "potential- 

ing the specific activity of the label to be in££ 
implying zero error in eignal measurement). Thus the' 
potential activity of a competitive assay be 
shown to be whereas that of a non^mp^ti* 

aeeay la J^WbR where, in 
eaa* R. u jammed to represent the labeled antibody 
miaclaaai&d aa boundflhAbU commonly referr* I tow 
nonapeofically bound- antibody. Thus IWAM-f £ 

bound, and Ro^/lAblCTo = f^/KR,. AawiW th* 

^SSJ^T* 18 a PP™«iaately identical for both 
competitive and noncompetitive usura i» .77 
from tbia simple a^^^E* ^StC 
of noncompetitive methods is greater thanth^f y 
petitive methoda by the facta Ti f b7^!l ^ 
labeled antibody that 

sample, tfthe nonspedfically bound tracti** 0 0 f% 
a noncompetitive strategy ia potentially carabl* «f J 
^ritivfty lOOOMUd greater *an tha . o7^L^ 
bve approach, other factors being equal ^ 

pressed in term. *m^£So£Stt£ 



Compelilive Non<ompcii t ive 

in r^fcv ^•.•■mW 



Log 

molecuiu/mL 




i 
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Flp, 6. Jtwnteatty predicted sensHMtles of 

eompettiva Immunoassay methods Cs^ST^^ n ° fV 

f»lyte measurements. upressM «S«^,! ,e SD °' 2ero 

N** In nanmnpatHjwt undwieh sum ,-«^. _ 

•^^r^£^^^^"*sv * 1* fnt 



***** For no«»aBrtli^«i« , ir!S^^ wn * Dh,,r 

to 0.1% or t " x * n ° * 



bbel of infinite .Md^S^ (o) "* ■» * » 

(ra which SSLtaS 
MOW in.Figure 6 rely) were baorf^Tii^VT^ 
"WinpSonB that M il,. j J . 011 <<«W 

ass 

Iff J*"- 1 

BBi»«i ,tUe im P«>vement in eeiiaitivitvil 

suffest that radioi^ n^^ ^^^^on. 
activity than 12 *I (, . ~t *? . of . m « !h low «" specific 

taply would utilize lJ^nSS%SS M 7" 

the order of 10" S ^ ^tiea on 

The results of a similar analysis of »K« 

j ""'en. «•?.. the crucial importance f r«H»« 

nonspecific binding of labeW *t,£~a* reducing 
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by una* an antibody of K - io» LW M 0Bfe . . . 
noncompetitive assav design D .\ 80 opoauzed 

-jr-.J," - fles * laa by-unng an antibodv of 

K =^10" L/moI m a competitive methnHI rv, .J*! * f 
important conclusions £ th*7£. rfthe «*t 

to sensitivities ofSe^def of lO^*^ " prartice 
mow. In short, ahho«Xu£i^ • or 
noncompetitive muX^SSL CUam8 ^^. 
than corresponding ^t^^^^T^ 
of the same antibody in e**«l!!JT^ m ** ^ 
« advantages t^SSS^** "~ 
itive approach can be realised «S! h • noDCO *Pet- 
labels of much hirier^L ^ b ^6»onisotopic 
superiority otrtF&SZtS*** ™ l 
are combined with ifieSS? ^JE* When ^ 
Figure 6 demonstratita^en Wver « 

Ma*^ 

unmunoassay methodoW n «TL Auorometric 

«, Hub J^C'ZZ iff^™* 
n lUMtopic immunoMeav „mu7 ^tra-sensitive" 
T*e same baric^^S^ 1 ^ * devdo P ed - 
ado^bymanyotb^^l^ 1 ^^ ^ 
of hagb^c activity wJTKmTS ^ 8 

Against this background, let ui nL 
deveW of highly senSt/v?, **• 
spot- immunoassays and multianalyt aaaX^aT 
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*™*>bg this methodology in 

The recognition that all in»nn 
rely on measurement of anti^^"^ esacntial V 

immunoassay (76). Thi/n^^^? 1 /^^ 
assay systems that, unlik*^^ to deBcn 'be 

sure the analyte <xn£n£^ Z*** 1 mea- 

Pie volume a Jof^^ tt ? ependeat both ofaam- 

of &, aZ tSKtszsfir • fiwntie 

representing the fr»2 ;„l i 6 foUow in* equa- . 



Ambient Analyte Immunoassay 

Particular attention has been ^t,-» u 
speciooa notion that an u^j££*»" *° 
"Mting 0.5/K is required to ° D appr »- 

conventional labeW^g^T^ of 
«plidUy overturned by thToWwJ^ 8 is 
inununoassays. which W^£^f^T»**' 
a new generation of 

Sptctnc «>ctMty 

1 ??^ove«i)er second per 
7.5 x io- labeled rjioleculaa 



Enzyme label 

Chwnituminesq&nt label 
Fluorescent label 



"monoclonaJ") antlxxiy ^^^^^ or 
various analyte concentr»7^- , ^ Presence of 

body concantSicrS^^ 0 ^ agaiMt 
less than (aay) o OLKT an antibody concentration of 

esaentially. W I^LS but ^ 

to an andyte^Sg nTeiut i UPP ° rt) , iS : 
bona!) occupancy Tantib^dT?^ 6 
fleets the ambient c^«n^ti 0 nt? g re " 
^dependent of the to^ ^?°° of e and is 

system. Oft fo r exaLSjY? S S ta ^ 

binding-site concentration rfnlr ° 1, " *»tibody 
10"" mow, or 6.?2 x K°'° 1/Jr *P«sents 0.01 x 
binding by antibody to** ' 

small, the resulting JuctiT^? a ? mm ^ « 
boa of ^^^gSc^ ^^^"^tra- 
<0.0Mr. analyte t^'T*' .^bodies is 
and the Bys JL h ^-^J-JJ 

2^SS^& rt ^^«nit.ba. 
away dau. The t«W^ i7?T^ "P^^tion of biadk, 

«W« .with [AbJ and [An]. Fo^tSS^T^ d "» ^inTUhW 
10 molTL (reprinted m 0 f iyA^f// ^"^tion 
cal far otf antibodies ffi w^rf ^"-^ < 

iae term "ambient" ia »Jj • j. 
paacyreflect, tb e a aa lAeeon^f«^ C * , l ^ ^^'body oco> 

e - tystea » ^pendent of sXle^^ "^^^ tube; 
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Antibody concentration * M 
drffwenl values of analvto /^SSi i2r V "' to cwx:8nlra ^ tor 

concentration. ^K^p^S^^ ^ N °!L^ tor 

BeraonUJ) >^ m ««h the pracepa * 

dent of sample volume. 

These conclusions lead to two fijrt)... „ , „. 
the antibody may be confinej ft Pt * ? Tl 
support, such that the total aitSSS£° V 2 nd 
*. within the -ta-^l^xl^^ 
- the sample volume to which th. «»JL . . ' here w 




Dual-Label Microspot Immunoassay 

ure 8 K1l ^^ontaining fluid (see Pig 

the -^ y T^roS, Pied « 

of Bensor and developing antibodies^T^^ j ,° 
antibody "counUta " t£ , ra fona the dual. 

w^aTIw JKL? ■*? ^ achieved * 

differe/t labe^Tg a ^T^J****" ^ 
chemiluminescent mark^T £ f radloa f ta l ^ "*y**, or 

different nat^FSo^ wr ^ of entire, y 
ticularly "^^^^^Pf ^ally par- 

optical BCMmiiigtedmiQu^X^Qwl 7 *• rf 



1 M .mm I - 




fetNtwt* wit* aAt^rv*!^ aj< 









Non < ompetlllv« 8 ,say Compel Msay 



H9« 8* Microspot Invru/noaaaay; (hft) firm i k* - 
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t fxmrm (> 

J X » n 'l-«n*m« — -r „ 

O ""►Won pic antibody 



Re. 9. Basic principle of duaMahci ««m— 

on fluorwcent^w aSto^" 1 analy,e "n™noassay 
«to erf a and fl fluwweant nh^JTv 

eo-W to a 'nono^»^^^S^ s ^ul^o ( ««» ) 5 

advantages stem from adopW B fl 
meaaurement For exampteXj thtl fluorescen « 
distribution of the sensor I^Cf ?! 0UDt ^ the 
field of view £ taSSTt^JS^*^ 
emitted fluorescent St uSLel?, I* 110 of 
tuatioas in the intensW?k^^ Lliewise ' Auc- 
beaxn arc apt toTSrf££^ (e * dtin tf «k*t 
tages are adfitio^ to ^S?^' ^ *<W 
this approach, LeTtia Sfc^S^, - *^ *« 
stancy of the amount oftn^T eMurin » 
assay system* removed. ^ to the 

Mfcrospot Immunoassay Sensitivity 

approach is *&* of SSZ noS^t* 
cation that nucrospJt ^ZT^fZ^ PT °» 
fave M conventional systemslKtTelv ^ "i* 5 "** 
amounts of antibody nLyTeadfo J* " lMBBr 
consideration of a inodTr^ i!f den "»»*»fd by 
sensor antibody molecX^L^ J! that 

e»P<»ed to the analyte Taridtt^t ^* mnain 
a»alyte is thereby u^X^i?^ for *« 

support divided bv the inJ^T 7 ^ aitea on the 
by such attacW^^^ v V0,Wafr - i « ^ected 
at eouilibriun, ^C^^T^ * ***** 
antibody is distributed unifo^iwu °f curru >6' if the 
bation mixtureTut^^^ the W 

aolecuJeTeiS u^SE?* 0 "** ««body 
surface den^ou\ne^£ S^lf?" 

of sensor antibody implies * J5L ! COnCentratlon 
the surface areaoveTwhich S^SE?^* in 
If, for exampT ^tibod?.* t £° dy is ^buted. 

^e a £2E* ^ts- ^ 

-^ysurfacedeneityis6(^ 



a surface area erf 10*^1 
. antibody bindin* sjt^TrLl 7 ^ accommodates 

tration of 0.01/JT, etc Let utLSZZS*? *° * coa « a - 
"^P^oftfcsei^ 

ing analyte at «ZT ~ tlb ? dles to a mediua contain. 

anaJyte.fonnmraryni^^-S ^^ 4g ^ th « ' 
let us suppose ti^^^.^^- finally, ' 
developing aatibodv trit^i Mtes *** with the : 
specific^ to ^f'^S latter 8180 bi «ding «W 
density of 1 nfole^eW ^ ^ at 

^SraTt^^L^ rf ■ P^ve 
1 mm J (effective anSviT^ Surface areafrom (e*) 

equafaon 4, the value of Fior Z . i Fro,a 
10- 3 . Thus at equiliSI a 1 fm 2 area is 4^8 x 
labeled antibody ^S^S^ 1^ ^ 
area is 2.99 x 10' t^SJuS^ *° ^ 

molecules present) tt" 1 50 * of ^ ^ analyte 
tibody SS£ °f labeled £ 

restricted to the area oa £ ts^ 118 instrument is 





*. Fiflo of vtrw oecr»u*i m. ^ 




S/B faiia 




;"7 0 "r-~~*— 

coated on tiraii*? 

« represemed by squaST^^^easunao rstS 
(a) Raducnon el both iha «i m—^' 

r «»««9 »>• antibody ^Sad^L'^ f " 6a " *• C*» * rZucZZSZt 
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•peaBe^oad labeled antibody within the instm- 
meat b field of view), the eignal/noiae ratio observed for 
thelW area is -3ft, Simflarly;the value of P for a 0 1 
mm 2 area is 9.02 x 10"«, the number of labeled an* 
body molecules specifically bound to the area is 5.41 x 
1<T, the number nonspedficafly bound is 10* and the 
sigm^ w ratio is -64. Likewise, the signal/noise 

rabyora0.01nmi , areacanbeshow n tobe--59 In 
short, the signal/noise ratio increases as the antibody- 
coated surface area is decreased, approaching a max! 
n^^Uteau) value of 60 as the area coated with sensor 
antibody falls below 0.01 mm' and tends toward ST 
If, however, a reduction in the antibody-coated area 

IT^^T^^ by . a «™*™dmg reduction in 
the detecting : instrument's field of view, the resulting 
reduction u, ^gnal" would nor lead to .'^ 
decrease in the background generated by no^dX 

fore, although reduction in the coated area would in- 
crease the fractional occupancy of the sensor antibody 

W faS^T rat 5 either remain «^stant or 
c «T^«s it might be advantageous Z 

Likewise, if the background signal generated within 
fte detecting instrument itself (ej^^t^ 
ode of a photomultiplier tube used to delLHwTe 
emitted from the antibody<oate^e5 jft? 

atonf^^' ? ™tio woSd 

aiso t>e attained at some optimal value of 

coated area, below which ^0 w"ul d S ^ut' 
however, one can generally reduce the size of the deS 
fend hence the detector-generated backg^und) JT£ 
aame rate as the size of the signaJ^tting^ tneret 
no reason-^ prinriple-for the rign^ris? rabTto 
as the.antibody<oated areTTp^resslveK 
reduced toward zero. ft. if we, accept the^XoS 

rf P^ 011 * A* measuW.Tof 
antibody occupancy (and hence of assay sensitivitrt 
*ese consideration,, suggest that it i^vaZ^to' 
reduce the antibody-coated surface area <*Z7^ 

aensor-antibody ZZEXZSZ 
although little advantage ia likely to accrue from redu2 
»g the area below 0.01 mm* (and thus thTTJZSL 
concentration below 0.0100. 
Were the microspot area indeed reduced to ze» both 

ratio between them nevertheless remaining essential 
constant^ implying thatno W 
the hnut, be recorded In practie , t ^ Jg« 
acton come into play when the Dumber of individual 
event. (e.g., photons) observed by a detrt£ta2S 

"ensor antibody concentration to zero. T^e Mint «♦ 

ft. ,W* W « .^,1 to be lost aufficientlv to affect 5 



ttcll^o^T^ rf «cupancy 
•SSf a s P eafic activity of the labeled 

anybody used to measure the occupied LiinTah^S 
higher the specific activity, theamaller SSSSSSl 
are* Thus, given labels of very high specaHctiSv 
one can envision circumstances Ti££V? 

sensor antibody may be exceedingly low. A morere* 
StZT* ^ 8 Variety rf ^cluSng^ 
the labeled antibody (or labeled analyte) S 

Jirtad I impos^bdity of formulating general 
gardxng this For example, reagent concentrations that 
are optanal for isotopically labeled reagents used with a 
oonvenbonal radioisotope counter (poesesamTa^ 
background dependent on its basic %a3Si „ 
hkely to be entirely different when very^SdS 
^labeUareue^andoneh^the^ot^ 
Aemeasunng instrument to samples of any size. In 

^i!?^™ ndu8ions **** ra experience ofRU 
and *ma techniques may prove mislead?*? wheV£ 

A more detailed theoretical consideration of (noneom- 
P^v^m^rospot immunoassay sensitivity (2^ 

x 1(6 x io»Ki + [Ab»Dyz>*[Ab») (5) 
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Ltiblfv V™*" de ^ (binding sites'^ 0 f aensor 
antibody, K = sensor antibody affinity (Un 0 l) fAh?- 
c^m.tion of labeled antibody in S^Jfi button 
pressed m umta of lyjf*. where ^ - labeled aSboSy 

ifSS-i tSSJL detectable surface densiS 

of labeled annbody (moleculesW). and C.T= S 
detecun . limit (moleculea/mL). For example^ [AbTJ 

20 molecules/^ then = 2.4 x 1 0 - molecuhSmL 
= 4 x 10 moVL end the fractional occupancv ofS. 
binding sites of the sensor antibody bT^Tn.l™ 
detectable concentration of analyteis 0 ^04% S^^? 
Bhow.thetbeoretic^asaays.n.i^^ 

A similar theoretical analysis of competitive mi«^ 
spot immunoassay indicates that potential semTibS 

venbonaJ competitive methodologies. In summary 
ateve considerations indicate that the a^Sntrf 

of molecules of sensor antibodies within the inimuS 

accurately measuring very low surface densities of de- 
veloping antibodies. They also suggest tSt« iil 

obtainable by conventional isotopically baaed bmmT 
noassays are achievable, and (6) tf latVc^verv^h" 
si^fic activity are available, the mJwE%& 
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n" 001 

menu pwn5ctoJ«*«Si^L?? e,, * ,6on «* l/< CuZT mo(ea "«'>*m». 



^ sucnitspot aoMva 

Spending on the ^ctS^A 1,6 **** and 
anient* used) cou^E^? ofthe measuring in- 
achievable in mJZ£E£* tte «33£ 
ffl 8n- P assa ^ ^ conventional de- 

« 2S^£$ addreM a A^rther question 
t» JZr 10 this context i • tkTu- on occa " 

tenstics of microspot aasavsTsL *' charac- 
regarding this isaue. Jg? flhould **ma£ 
sending antibody, the W r ^l^ ,er *• »*»*55 
the velocity of the SiS y ^^°f. 
that at the limit (i. bu,dia * Action, ao 
"tuated within the nSr±?^ of «*5* 

J^etica ofthe reaction^^Jf™^ «n» the 
Wgeneoue li^nd-phaaVSf obeer ^ « a 

effective concen^*,^!^ J""* •'though the 
*° n Medina is ncSStt'Z*** « the in^S! 
^chaenaor ^^ZZ^^ 01 ^ rat, at 
•pot become occupied i* irZlJ ?, the micro, 

cumstaace thaTwhen . J??^"* «»•*«• in thi/tiT 

^oae of n^coopetitive T*^' Particularly 

» »W the relationaSp S^^*^'^ 
of aeneor antibody and the riSS/ A ^ et,0Ml oc^pancy 
^ it i. ^dea^fc* ratio^Ued 
the ratio rues is greatest^!! l .^ e «t which 

la f tru n>antation whose field«f 2 • Ieaat ^ given 
^crospot area, the high** t^'Z M . rertri cted to the 
observed (^nyse^^* 01 * ** ™ be 

<0MK. Insert, contrary . flvst «a is 

P^^andtottegenerX^^^ 
^unoaaaay incubation^^^^that short 

^ rf -Mr. ^SSyt^a? 



* ,l " ,a *<*»»„„ 

T Audit, 5* our pmUmJ 

fli^cena, ^ m XZ\ n «™ re m«( «f fell 

, ,e *«-' toe laser scajinin* r^^ZT/^ m3tr umenta- 
not specifically de fi i^^°^cro S cope), ^ 

*T in dem^^? 6 P^t purpoae, has 

bSS apprcach - ** feaafbait y of SI 

^ ^^OhT^^ flu ,°, re8cea « microscoDe* « 

jr ^ ^^^r 1111 ^ ^Sd 

f'nts thua poaaeaa 7h* ^fTf- J**"* at other 
detector. Such systenL cT a £^^ of 
fluorescence microscWL^.T* 11 ronv entionaJ e^! 
P 08 ^ to an eaaentiXT^f W ^f* the 'P^wVul 

aUd » a denned pS^f^^tenMtterBafta. 
spontaneous eJt£ h l *?T' Electro^ 
«thode contribute to tte^^^PW 

81tl ^ty-be minim^p^ gl1 ^ ^^P* aasay a«! 
^^nta^n^efea^^ C 
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to diminish with future improvement in photomultiplier 
design. Other sources of background include fluores- 
cence emitted by components in. the optical system 
which may not, in current instruments, have been 
constructed with background reduction as a prime con- 
siderotic*. Nevertheless, they detect with high sensitiv- 
ity fluorescent signals. For example, one commercially 
available microscope is claimed to detect fluorescein at a 
density of 10 moleculeaVm'. Most commercially avail- 
able fluorescein isothiocyanate (FITC)-labeled IgG ex- 
hibits a fluorophor/protein ratio of ~4: this imnlies 
detection limit (D^ far antibody surface density tf 
two c* three FITC-labeled IgG molecules per xmcrome- 
ter 1 . This, in turn, implies a theoretical sensitivity for a 
twcvsite immunoassay of -2^3 x 10* analyte molecules 
peraJlAter. assuming identical parameter values as 
above or « x 10* molecules/mL if the sensing an" 
body has an affinity of 10" Umol. Clearly, 
may be increased by loading more fluorophor either 
directly or indirectly onto the antibody. 

Our preuminary studies have relied on a less sensi- 
tive microscope, albeit one possessing facilities for dual 
fluorescence measurement Its argon laser emits two 
excitation lines at 488 and 514 nm. It is thus particu- 
My efficient in exciting blue/green-emitting fluoro- 
phoras such as PITC (excitation maximum mi^Mt 
Z^J^T 1 * fluoro Phores such as Texas 

Redjexcitabon maximum 596 nm). However, the ratio- 
metric assay principle permits considerable variation in 
detection effiaencies of the two labels because the 

tte antibody couplets can be chosen to yield signal 
ratios approximating unity. Inefficiency of the arson 
aser in exating Texas Red is thus not a major handicap 
m this context Though this instnmaenTrelie? o 
conventional microscope and not on an optical swtem 
deagned for this purpose (and thus implicitly lesTsS 
ative), it permits quantification of fluorescence signals 
generated from microspote of any selected area. Initial 
studies have revealed that under conditions that are 
^P^^*?™*nt is'capable of detecting -25 
nrC-labeled and (or) 150 Texas IW-l^ledlgG mole- 
cules per micrometer 8 , while scanning an area of -50 

The development of inkroBpot immunoaaaayB has also 
necessitated closer scrutiny of the mechanisms involved 
in the coupling of antibodies to solid sup^ L £e 
present context these should display T^JL * 
fdaorb (m the form of a monolayer)-^ toTSenUy 
Unk-aJ high surface density of antibody combined with 
tow mtMc^gnal-gen^ting properties (e.g., low in- 
tiinac fluorescence), thus minimizing background We 
have examined a number of candidate materia, such 
as polypropylene, Teflon* cellulose and nitrocefluloae 

membranes, microtiter plates (clear polystyrene plates- 
black, white, and clear polystyrene pU^aas sbd« 
and quartz optical fibers coated with 3 -(amino propyl 
tiiethoxy ailane, etc, and several alternative protocols 
for achieving high monolayer coating densities. These 
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studies have exposed phenomena neither evirW ~ 
importance when antibody binding to ZhA^Z * 

used white Dynatech-^icrofluor microti^ nEteil! 
formulated for the detection of low fluoreac^ a^gSl 
and gelding high signal/noise ratios a^SSSSS 
denies of ftnctional antibodies (-5 x ICTIgG^Z 
cules/^Vfor assay development although «5 
plates are not ideal. Indeed, deficiencies fa£ Ttibodt 
deposition methods used constitute the principal sou™ 
of imprecision in assay results and the UnStliin^Z 
sensitivity that this implies. Clearly, S r^nfc ™ 

m!^ 1 ^ of present instru- 

mentation (which, among other thing,, does not permit 
f\ eU *? ^ tone-reaolving techniques to distinguish two 
mdividual fluorescence signals either from s*eK£5 
from background fluorescence) and the cmdenesT of 
present methods for coupling antibodies onto^nal] 
areas we have verified the theoretical concepts ouSed 
above by comparing the performance of several assay. 
iwT"^ » ^««P«* fonnat and when convent 
toonally designed Although unoptimized, ratiometrte 
microspot assay, have yielded sensitivi^values do*W 
approaching those of conventional optimized uatTZ 

SZS P 0fa "tiometricl^system £ 
tiiyro^pm, ^ use of Texas Red- and CTlSSeM 

iST* 1, T **** ta 13 - Bearing in mindthe 

? ntrK^ 0118 ° f ?»" aad ^eTWeTtian! 
al fluorophors when used as immunoassay rea*«rf 
kbels such results are encouraging, dthoughfiXr 
work is dear y required to achieve the cc^d^ibS 
C"7^ theoretically predicted with „ne £ 
unproved fluorophors, better anu^y-rnicrospottin, 

ntetSr ' PUTPOSe - bUih «~^W5XS 
The finding that highly sensitive immunoassays can 
be performed with far smaller amounts of«3b^th£ 




SokW-prw* /y> coated to* mU 
$<** ph»— Ab coats* lor to. 
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TSH concentration <mU/L) 
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fig. 13- Response curve In a duaHabeled mlcrianot r«fem»t*> 
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are currently used conventionally permits in turn the 
construction of antibody mierospot arrays enabling, in 
principle, the simultaneous measurement of thousands 

cf-difierent substances in 1-mL samples. In coUabora. 

tion with investigators at the Centre for Applied Micro, 
biological Research, Porton Down, UJC., we are pres. 
entry developing various techniques for the creation of 
such arrays. Indeed, similar technologies have recently 
been used for the parallel synthesis of several different 
polypeptides, these enabling 10 000-microspot arrays to 
be constructed on silica chips approximating 1 cm 8 (24) 
Although arrays of this capacity are unlikely to ever be 
required for conventional diagnostic purposes we can 
anticipate that the ability to simultaneously measure 
many substances in the same sample will have revolu- 
tions coxxsequences in medicine and other similar 
areas. In addition, such techniques may ultimately 
permit the individual analysis of the multiple isoforms 
tf certain heterogeneous- analytes (e.g., the glycopS 
tern hormone*,), such molecular heterogeneity currently 
presenting a major obstacle to the staiuJardization and 
interpretation of many immunological measurements 
(25). Moreover, although these concepts have been illus- 
trated in an immunoassay context, they are clearly 
applicable to all "binding assays," including those rely 
mg on the use of DNA probes., hormone receptors, ete 
For example, labeled lectins that are specific in their 
reactions with the sugar residues in the oligosaccharide 
chains of .glycoprotein molecules may be used, together 
with specific antibodies, to impart additional "structural 
specificity" to sandwich assays (26, 27), possibly over- 

Sf ag S"J mi ^ DB t €t antibodies Per se in regard to 
differentiation of the glycosylate variants of the elv- 
coprotem hormones. 

Summary and Conclusion ' 

Because of past confusion regarding the concepts of 
precision, sensitivity, accuracy, etc.. several erroneous 
concepts have become incorporated within currently 
accepted rules of immunoassay design. In particular 
much higher antibody concentrations are customarily 
used than are necessary to achieve very high assay 
sens.trv.ty provided that certain measurement stra tl 
gies are adhered to. In this presentation we have 
attempted to show that, in principle, the 5*2 ™ 
sensitivities are obtained by confining a small numS 
of sensor antibody molecules onto a very small areaTn 
the bra of a mierospot and measuring their occupancy 
by an analyte by using very high-specificsctivitv «dZ 
ve loping" antibody probes, thereby n^iizing the Z 
nal/noise ratio in the determination of sensor antibody 
occupancy. This observation, which contradicts cur- 
rently accepted immunoassay design theory, in turn 
makes possible the measurement of an unlimited num- 
ber of different analytes on a chip of very small surface 
area through the use of, e.g.. laser scanning techniques 
closely analogous to those used in compact disk tech' 
juques of sourid recording. Extensive experimental stud, 
es in this area albeit conducted with relatively crude 
techniques and instrumentation not specifically de- 
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f 57 l ABSTRACT 

A method for determining the ambient concentrations 
of a plurality of analytes in a liquid sample of volume V 
liters, comprises 

loadmg a plurality of different binding agents, each 
bong capable of binding specifically and reversibly 
an analyte of interest onto a support means at a plural- 
ity of spaced apart locations such that not more than 
0.1 V/K moles of each binding agent are present at 
any location, where k liten/roole is the equilibrium 
constant of each such binding agent; 
contracting the loaded support means with the sample 
to be analyzed, such that each of the spaced apart 
locations is contacted in the same operation withthe 
sample, the amount of sample liquid being such that 
only an insignificant proportion of any analyte pres- 
ent in the sample becomes bound to the binding agent 
specific for it, and 
measuring a parameter representative of the fractional 
occupancy by the analytes of the binding agents at 
the spaced apart locations by a competitive or non- 
competitive assay technique, using . labelled site- 
recognition reagent for each binding agent capable of 
recognizing either the unfilled binding sites or the 
ruled binding sites on the binding agent, which ena- 
bles the amount of said reagent in the particular loca- 
tion to be measured. A device and kit for use in the 
method are also provided. 

17 Claims, 1 Drawing Sheet 
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DETERMINATION OF AMBIENT wWch n3 ° nodonl1 "tibody may, for example, have 

a * veJtA *' ANAtrras io» Lters/mole for the specific antigen to which it 
.Tnis is. continuation of co-pending application Ser 5 5^ ^ "nder the above generally accepted prac- 
Nc, 07/460.878. filed as PCT^B88/0O649, Aug 5 ^ » b»dag ^eat (or nte) concentration of the order 
1988. ' Aug - *■ of 10 " molester or more is required for binding 

FIEITj op tot ,v,n:v-nov, agen ? ° f such M "l 0 ^™ 11 constant and. with Q aid 

FIELD OF THE INVENTION sample volumes of the order of 1 milliliter; the useTf 

The present invention relates to the determination of ln 10 ~ Mor mon mole of binding agent (or site) is conven- 
ambient analyte concentrations in liquids, for example "onilly deemed necessary. Avogadro's number is about 
the determination of analytes such as hormones, prote- 6 X 10" so that 10- >« mole of binding site is equivalent 
ms and other naturally occurring or artificially present ,0 moTt ^ 10 s molecules of binding agent even as- 
substances m biological liquids such as body fluids. SU ™»>S that the binding agent possesses two binding 

BACKGROUND OF THE INVENTION 15 P cr ™ olc ^? lc - For specific binding agents of the 
. , uxvtwuoN very highest affinity K is less than 10» liters/mole so 

wnw^ P n^ ™ late ? 4ti0Ml Application that conventional practice requires more than 10' mole- 

WOW/01031 to measure the concentration of an ana- cules of binding agent, whereas binding agents^ 
lyte m a fluid by contacting the fluid with a trace lower affinity of the order onoMh^Sle^L^ 
amount of a mndmg agent such as an antibody specific ,„ the use of more than 10" molecdes^to c^Ten^ 
[Z^ Tf^, ^ C UUt h revcra " b) y the 20 practice. In fact all immuno^v k^^^^ 
analyte but not other components of the fluid, detennin- mercially at the present rhn^fLrl .^ 
mg a quantity representative of the proportional occu- and u^ear ^^TbiS 

pancy of binding sites on the binding agent and estimat- more fr^u^iT™^ u approximating to or, 
ing from that quantity the analyte c4cenu.tion.1nTt b cerS t^' ESS? " ° f V/ * 
application I point out that, provided that the amouMof 25 f-^f^ ^ g on the use of labelled 

binding agentis suffidentiylow tL7ta^Z«£ »*odw « » convennonal to use as much binding 
into the fluid causes no £EZ2i2JEZE£ SK?* pr0p0nioat ° f vS 

concentration of ambient (unbound) aaalyte. the frac- o g . • 

tional occupancy of the binding sites on the binding e JSS"^ t ^ ! of substantial proportions, for 
agent by the aaalyte is effectively independent of the 30 example 50 v % ' of ^ »» the liquid samples under 

absolute volume of the fluid and of the absolute amount va™ • sv . Mems ' Ae fractional occupancy of the 
of binding agent. i.e. independent within the limits of .v . , ^ e bindin8 * gent " not independent of 
error usually associated with the measurement of frac- volume of the fluid sample so that for accurate 

tional occupancy. In such circumstances, and in these q . uaa,ltative " » necessary to control accurately 
circumstances only, the initial concentration [HI of 35 I v ° lume of ,he «=pk keeping it constant in all tests, 
analyte in the fluid is related to the fraction (Ab/Ab.) of * he,her of the sample of unknown concentration or of 
binding sites on the binding agent occupied by the ana- standard samples of known concentration used to 

lyte by the equation: generate the dose response curve. Furthermore such 

AVAh.-l^llf i A+K*iB\ TS™ *° re(Jttire e ?* aa contr °l of »»>e amount of 
J 40 binding agent present m the standard and control incu- 
where K*t (hereinafter referred to as K) is the equilib- . on tubes - Thcse limitations of present techniques are 
rium constant for the binding of the analyte to the bind- universally recognised and accepted, 
ing sites and is a constant for a given analyte and bind- Pa,enl Application 2,099,578A discloses a device 
mg agent at any one temperature. This constant is gen- for immunoassays comprising a porous solid support to 
erally known as the affinity constant, especially when 43 which """gens, or less frequently immunoglobulins, are 
the binding agent is an antibody, for example a mono- bound at a plurality of spaced apart locations, said de- 
dona] antibody. vice permitting a large number of qualitative or quanti- 
Tne concept of using only a trace amount of binding tallve immunoassays to be performed on the same sup- 
agent is contrary to generally recommended practice in P 0 * 1 - for example to establish an antibody profile of 1 
the field of immunoassay and immunometric tech- 50 sample of human blood serum. However, although the 
mques. For example, in such a weD-known work as individual locations may be in the form of so^aued 
"Methods n» Investigative and Diagnostic Endocrinol- microdots produced by supplying droplets of antigen- 
?fT i'iT" • . A - Benoa J « ,,d R - S. Yalow, 1973 at pages containing solutions or suspensions, the number of 
111-116. it is proposed that in the performance of a moles of antigen present at each location is apparently 
compeoove immunoassay maximum sensitivity of the 55 still envisaged as being enough to bind essentially all of 
^.v". pr0p0ni0B ofthc "»"cer" ana- the analyte (e.g. antibody) whose concentration is to be 
lyte that is bound approximates to 30%. In order to measured that is present in the liquid sample under test 
Sat^eTS^^?? ° f —Jytc the This is apparent from the fact that the ,355 
theory of Berson and Yalow, to this day generally ac- method used in that application (page 3. unes 21-28) 
SSS^f " TeqWZtS that lbe 60 wvo]ves k-own Amounts Tu£J£. 

bS^T i^L^ , ? ea ^ ( ?• T C ° y ,peakin *' 0f 8,0bulin ^ a PP ,ied 10 ~PP0rt; but this means 
binding utes. each molecule f binding agent conven- that, in the samples being tested, essentially every mcjT 

fore be greater than or equal to V/K. A binding agent tion) prese^tTlbe ° ^ " ^ 
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SUMMARY OF THE INVENTION 
The present invention involves the realisation that the 
use flughquaiitme»ofbindiag« S eatisndtherneces. 
sary lor good sens tivhy in immunoassays nor is it gen- 
erally desirable. If. instead of being kept « Urgf as 
possible, the amount of binding agent is reduced so that 
only an insignificant proportion of the analyte is revers- 
ibly bound to it, generally less than 10%, S 
than 5% and for optimum results only 1 or 2% or less, 
not only is it no longer necessary to use an accurately 
controlled, constant volume for all the liquid samples 
(standard solutions and unknown samples) in a given 
assay, but it is also possible to obtain reliable and sW 
times even improved estimates of analyte concentration 
using much less than V/K moles of binding agent bind" 
mg sites, say not more than 0.1 V/K and preferably less 
than am V/K. For a binding agent havmg^£ui~ 



10 



IS 



UbeDed w«h markers enabling the concentration levels 

.1™^ TCA \T r b ° Uad to * c different^hS 
•gents to be measured, for exwnple fluorescent marker? 
Such measurement, may be performed consecutive^ 
for examp e usmg a laser which scans e*raTSw£ 
port, or simultaneously, for example using apS 
depending on the nature of the K 
Otherjmagmg devices such as , television camencan 
ato be used where appropriate. Because the binding 
agents are spatially separate from one another it i, 3 
"^onlyasmall number of different markerK 

Zc7Z% T kCr Ubd «nro»«hout and to scan 
eachbmdmg agent location separately to determined 
presence and concentration of the label. By use of the 
invention ^considerably more than 3 analy^can * 
formed with a single exposure of the soUd «Z 
with liquid to be analysed, for example 10, 20 30 50or 
even up to 100 or several hundreds of analyses 
£^^*V™»™*°>> P^des a 



hters/mo e and samples of approximately 1 mi size this 20 m2SL^^ ^ P rcscnt ^vention provides a 
is approannately equivalent to not more than loTpref T""!* mhiem concentrations of a 

erably less than 107 moMM rt f^, : P rc ! plurality of analytes m a liquid sample of volume V 

liters, cornDrkinc- 



it i 7 — '.Zv v lu UUL more man io* t pref- 

eraWy less than 10' molecules of binding agent « each 
location m an individual array. If the value of K is 10" 
hters/mole the figures are 10* and 10* molecules respec- 

nn ^frjfo* "i* * eonte «' ^hters/mole they've 25 
10" and 10W molecules respectively. Below 10* mole! 
cute of binding agent at a single location the accuracy 
of the measurement would become progressively less as 
the fractional occupancy of the binding agent siteTby 
the analyte would be able to change only in discrete » 
steps as individual sites become occupied" o" S 
p*d, but » prmople at least the use of as low asTo 
molecules would be permissible if an estimate wiS an 
accuracy of 10% is acceptable. Practical consTdSo" 
nurygive nse to a preference for more than IV 35 

It wiD be appreciated that the abovementioned GB 
paten: application 2.099.578A. which for qua^maSe 
estimation relies on large amounts of binding agenland 
essentially total sequestration of all analyse. £Lw «0 
recogmse the advance achieved by the present invert 
uon. which instead relies on a different analytical "X 
aple requiring measurement of the fractional occu- 
pancy of the bmdmg agent and which thus requires only 
a very low proportion of the total analvtTmolecu]« « 
present to be sequestered tram the sample^ 5 

JL™?Z g ^ leCOgDitiOD - of such small 

amounts of binding agent is permissible, it becomes 
feasible to place the binding agent reouuS Las^ 
concentration measurement on a very small area of a so 
solid support and hence to place in juiupc*tio»To one 
another but at spatially Kpmte ^ onTskgle so M 
support a wide variety of different binding ag«tV«e. 
afc for different analy.es which are or niy bTpreS 
simulunmuiv in a i , ™? present 



liters, comprising; 
loading a plurality of different binding agents, each 
being capable of reversibly binding an «a?y« 

25 ° r u"" y bC Pment ia ** Kquid «d^ 
specific for that analyte as compared to the ofe? 
components of the liquid sampleTonto » 
means at a plurality of spaced t£X*£S 
tha each location has not more than 0.1 V/K 
molf ;? * ^binding agent, where K liters/- 

™ L^f "P**™* of the binding 

agent for the analyte 

^ «"Pon means with the liquid 
sample to be analysed such that each of the spaced 
apart locauons is contacted in the same operation 
with the liquid sample, the amount orhquidused in 
the sample being such tha, only an ^significant 
prcporuon of any analyte present in thel£uTd 

cT/fortLT ^ " ** *** 
measuring a parameter representative of the frac- 

agents at the spaced apart locations by a competi- 
live or noncompetitive assay technique using « 
sitt-recognition reagent for each binding agent 
capable of recognising either the unfilled binding 
sites or the filled binding sites on the binding agenf 
said site-recognition reagent being labelled with a 
marker enabling the amount of said reagent in the 
particular location to be measured 
The invention also provides a device for use in deter- 
mining the ambient concentrations of a plurality of 



be analysed wiD cause each binding .gent swnTuU^ T bmd !°l agents - each b»ding agent being capable of 
up the analyte for which it is specific V anT,»t £ «yers,bly binding an analyte which is or may be pret- 
fnuniomd binding site Tit ?J SL^W ^ ^ b Spedf,c for ^ « 
analyte concentration in the liquid, provided oriy 60 0thtr ^P 00 " 1 * of the liquid saniple, 



* r r proviaea only that 

the volume of solution , and the analyte concentJation 
therein are large enough tha, only an insignificant frac" 
non (generally less than 10%. usually less^m^%) of 
^analytessbound to the point. The ^r^SbSng 
site occupancy for each binding agent can then be deter- 
mined using separate site-recognition reagents which 
recogmse either the unfilled binding rites of filled tod- 
mg sites of the duTerent binding agent, rad ^J^c 
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2 t K a ^ers/mole » the equilibrium constant e/tha, 
KpSc rCaCU °" ^ ,hC Malyte 10 which " 

A kit for use in the method according to the invention 
compnses a device according to the mventiol , p ? U ral" 
J«y of standard samples containing known concent 
uons of the analytes whose concentrations i^heTquld 



" ' " 5,432,099 

5 6 ~ 

sample are to be measured and • set f labelled site- means to be such that Uquid samples f approiimaterv 
recognition reagents for reaction with filled r unfilled the v lume V liters are readily retained in contact with 
binding sites on the binding agents. the plurality of spaced apart locations marked with the 

In arriving at the method of the invention, I have different binding agents. For example, the spaced apart 
found that, generally speaking, for antibodies having an 5 locations may be arranged in a well in thV^pport 
affinity constant K liters/mole for an antigen, the rela- means, and a plurality of wells, each provided withthe 
nonship between the antibody concentration and the same group of different binding agents in spaced apart 
fractional occupancy of the bmdmg sites at any particu- locations, can be linked together to form imkronter 
lar antigen concentration and the relationship between plate for use with a plurality of samples, 
the antibody equation and the percentage of anti- 10 When the support means is to be used in conjunction 
gen bound to die binding sites at any particular antigen with a measuring system involving light scanning, the 
concentration follow the same curves provided that the material. e.g. plastics, for the support is desSably 
antibody concentrations and theanttgen concentrations opaque to light, for example it maybe filled with an 
are each expressed m terms of fractions or multiples of opacifying materia] which may inter alia bTwhhe « 
,/K - 1S black, such as carbon black, when the signals to be 

BRIEF DESCRIPTION OF THE DRAWINGS measured from the binding agent or the site-recognition 
_ ... ... . reagent are light signals, as from fluorescent or lumines- 

Tbepnnaple underlying the method of the invention cent markers. In general, reflective materials are pre- 
may be better understood by reference to the accompa- ferred in this case to enhance light collection in the 
nymg drawing which is a graph representing two sets of 20 detecting instrument or photographic plate. The final 
curves plotting the relationship between antibody con- choice of optimum material is governed by its ability to 
centrauon and the fractional occupancy of the binding attach the binding agent to i£ surface, its absence of 
sites at certain prescribed anUgen concentrations and background signal emission and its possession of other 
the relationsntp between antibody concentration and properties tending to maximise the sienal/noise ratin for 
the percentage ofantigen bound to the binding sites at 25 the particular marker or markers attached to the Wnd- 
the same prescribed antigen concentrations. Each curve ing agent situated on its surface. Very satisfactory re- 
relates* the »titody concentration [Ab], expressed in suits have been obtained t Se iSnJ^ fSXS 
terms of 1/K. plotted along the x-axis. For the set of below by the use of a white opaque polysryrene nticron- 
remaul coniUnt ° r Jeclme with increas- ter plate commerciaDy avauablefrom D^a^Ser 
ing [Ab], the y-axis represents the fracnonal occupancy 30 the trade name White Microfiuor rni^Xrwdls 
<F) of bmdmg sues on the antibody by the antigen; for The binding agents used may bTbiXg^ts of 
die second set. the y-a*s represents the percentage different specificity, that is to say agenu which are 
(b%) of antigen bound to those Bmdmg sites. The indi- specific to different analytes, or two Ir mo« of then^ 
vkJiuI curves m each set represent the relationships may be binding agents of the same specS bmof 
corresponding to four different antigen concentrations 35 different affinity, that is to say agents winch are specific 

r^f^/^-rT" ° f \ Mm ! ly ,0 ^:'/ 0/K - totbesameaaalytebuthave^erentequmWuTcon! 
O.l/K and 0.01/IL The curve show that as [Ab] fall, F slants K for reaction with it. The latter alternative is 

T^ZZ^lfZ COnSUDt d ' VElUC ° f WhiCb P^^y useful where the concentration oTalyte 

is dependent on [An]. to be assayed in the unknown sample can vary over 
DETAILED DESCRIPTION 40 considera ble ranges, for example 2 or 3 orders of magni- 

_ , . . ... tude, as in the case of HCG measurement in urine of 

The choice : of a sobd support is a matter to be left to pregnant women, where it can vary from 0.1 to 100 or 

the user. Preferably the support is non-porous so that more lU/ml 

the binding agent is disposed on its surface, for example The binding agents used will preferably be antibod- 

^T^'^l' ^ "T" ^ 45 ieS " morc P ref " ab 'y n>onocloJal antibodies. Mono- 
bmdmg agent, depending on its molecular size, to be clonal antibodies to a wide variety of ingredients of 
earned down into the pores of the support where its biological fluids axe commercially available or may be 
exposure to the analyte whose concentration is to be made by known techniques. The antibodies used may 
determined may hiewue be affected by the geometry of display conventional affinity constants, for example 
the pores, so that a false reading may be obtained. Po- 50 from 10» or 10' liters/mole upwards, e.g. of the order of 
rous supports such as nitrocellulose paper dotted with 10><>or 10' > liters/mole, but high affinity antibodies with 
spots of binding agent are Uierefore less preferred. Un- affinity constants of lOiMO" liters/mole can alsobe 
Unv the support, used a GB 2.099.578A, which stem to used. The invention can be used with such binding 
need to be porous because of the large number of mole- agents which are not themselves labelled. However it is 
cules to be attached, the support, for use in the present 55 also possible and frequently desirable to use labelled 
invention use much smaller quantities and therefore binding agents so that the system binding agent- 
need not be porous. The non-porout supports may. for /analyte/site-recognition reagent includes two different 
example be Aplastics material or glass, and any conve- labels of the same type. e.g. fluorescent, chemilumine*. 
ment ngid plasties matenal may be used. Polystyrene is cent, enzyme or radioisotopic, one on the binding agent 
a preferred plastics material although other polyolefins 60 and one on the site-recognition reagent. The meLurine 
or acrylic or vinyl polymers could likewise be used. operation then measures the ratio of the intensity of the 
The support means may comprise microbeads. e.g. of two signals and thus eliminates the need to place the 
such a plastics material, which can be coated with uni- same amount of labelled binding agent on the support 
form layers of binding agent and retained in specified when measuring signals from standard samples for cali- 
locsoons, e.g. hollows, oo a support plate. Alternatively 65 oration purposes as when measuring signals from the 
the material may be m the f nn fa sheet r plate which unknown samples. Because the system depends solely 
is spotted with an array f dots of binding agent. It can on measurement of a ratio representative of binding site 
be advantageous for the configuration of the support occupancy, there is also no need to measure the signal 
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ing bmdmg ageno onto supports such as tubes, for «. ofsnlall molecular size sS « SvrS u^S 
ample by contacting each spaced anart location nn th. *. ^ myroxmt { 1 4j, unoccu- 

allowing them to remain m contact for a oeriod^f tim, m f? ? , ... ? , mokcuie -«-8- « Proick 
before Washing the drops away A rouEhk J™Z molecule-whtcb is directly or indirectly labelled. The 
small fraction of the SK M^iSSfa ESS sttt-recogmoon reagents may be labelled directly or 

becon.es adsorbed cntot^Vup^LT ^of S flutetc^h *?—■ 1 labds »* « 

procedure. It is to be noted that th* Z,VJ?7L!"~ r ™°! esc **< rhodamme or Texas Red or materials usable 
ES?Z£ o^^^^^TJ/,^ „ W r e J r "° lved pukcd *«««« such as europium 
than L g Z>** ^co™^^ " Tex ^^t^t^-r^ BVM,iOMlLn - 
coated tubes; the reduction in the number of moieS« radioTsoto^b^^ c ^ cnu ^ nesc « lt . «"y»e or 
on each spot may be achieved solelv bv reductfo. ofTi*. rad,0,so,0 P. ,c labds may be used if appropriate. Each 
size of the spot rather ^*e coating SSTkfi SThT f Z? aMy labeUed ^ 
coating density is generally d-fa35KS£ B 'S 20 SlS^SSLSS ^ *" t^*" 
nal/noise ratios. The sizes of the soots are^vJu . """I"?*?^ 1 "agents may be specific 

geously less than 10 mm' preferably^ JLTSrf .? 8 ' 7* ° f b,Bdm * a 8"^y" spots in 
The separation is des^lyKo nicely 2™ 3 rivi^^" " * ""^ c*c«m«a»c^ i7with 
times tbTradius of the spot, or ^ £1" fcCT TT" ^ 88 HCG "H which 
geometries can nevenhetes be cb^ged^ re^S 25 Sen "abU tc "** ^ST"^ 

being subject solely to the limitations L the numbS one of E LTt? ^ hmdm8 " 

binding agent molecules in each sdol the mini™,™ i .u »e spots. 

volumf ofthesampletowmchlS^ofsPO^wS^ J"^ -"T tedmiqUe ?? ^csenutive of 

eaposed and the means locally JSeT?™ ^,n?« „^ ° f ^ bia ^6 »8«t in the test 

JJJ paring an array of spots in the » 30 Sr^ ^ 

in case of antibodies Jbi2K£f with7££ j2 Wgether ' P rovided *■» 

tion containing albumen or other JroSo muAtfal 35 !2>£ " PrCSCDt m ,0Be of standari 

remaining non-specific adsorption siteTm the ^ P Fracuonal occupancy may be measured by 

and elsewhere. To«rJu^^t the Imo^t of S -^f * binding * tes <« with an ami- 

agent in an individual spot wtf be 1« Znle^f or unoccupied binding sites (as with 

mum amount «L1 V/^ fe ^ to cS™ toTe" £2 0 ^ ,0 ^ ,c , aBt,bod y^o»« converse of the 

principle of the present invention, the a^Tof binS' 40 the "T^ 5 ' " b d ^f aMe ,0 measure 

ing agent present on any individual site^ be checked wS^. ^ d ? S ", t0 " ro » change » 

by labelling the binding agent with Ta££££S£ ° f 0 - r 01 K Proportionately greater 

of known specific activity (i.e. l^wn amc^nt of ^ ~ S^ ih ^ fraCUOnaJ in the 

per unit weight of binding agent) anT me^uSg £ ^1^ alteraa0ve " «««"y 

amount of marker present Thus, if the use of labelled « .v,,. -—u^j- . , 

binder is not desired on the solid "Tlu ttat embodlment of the present invention which 

method of the ^oc thrbirfdtgS^v^ ll^ T fluores « nt *e measurement of 

tions to those found in thanrial to rive^ to cot- „ the buldln 8 a « cnt 300 ^e other on the site recog. 

loadings of binding agent ctTbe ^^. f mn ° D M V b = carried out by a laser scanning 

belledfind-ng aS X s^mTbe acS S " MRcl^t „ 8 Lasersh ^ 

The ntinimum size of the liquid sWle fV* Z^t , ^ 5 °°' a ^f' e from B, °- Rad Laboratories Ltd., 

correlated with the number Kjrft&JXS ™1 1^1 'J^ Ch ^ t] Election system. This instru- 

(less than 0.1 V/K) so that only an insignificCwo^ ™ * ^ '° scan the dots or the like 

tion or the analyte present mTLZdSoV „ on the support to cause fluorescence of the markers and 

bound to the ^STg^LlS7 T St^ a IT ?^ / Jte " ,0 aad measure the 

general rule less than 10% imiaMv P EF£?£ a * of fluorescence Tuae-resolved fluo- 

desirably 1 or 2W lesfdeSg'ol Se^acclS ^c^t^ ^ ^ 

desired for the assay foeatM ^ccuncv ^J£l l^^I crosstalk) between the two channels can be com- 

o^rStamTiS wh^uS^fvf^ f0r by SUx,dard corrections if it occurs or 

aXe^bou^fftte^SS «r°^ ° f 60 conventional efforts can be made to reduce it. DiscrimL 

one or a few ml r tes. e.g. dowV. tolOT I^erT » 2^1 * dkd Sp0tS " acc omphsbed in the present form 

less, are often preferred, b« Tcu^c^nS " i^a^nw' " T^"^ ° f ^^ubing 

when larger volumes are more conveaientl™^ 6< iSsJ^ ^ w » vd «g* of the two fluorescent 

and the geometry may be adjusted ^^7^ S33S bTXV l""^ my »* dis - 

sample may be used at its natural concentrati n Welor dS, B n»« P / characteristics such as 

if desired it may be diluted to a known J? ™ff nuor f t » cencc ^ ^ bleaching times. 

cnenL «c . «>d any of these means may be used, either alone or 
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in combination, t differentiate between two f\ nn ^ 10_ 

pbores and hence penult mczsuT^Ztf ."^ nucrograms/ml is fonned andQJ r 

standard samples are used. 0 10 PMspnate buffer and then thev axe fin^^vtL 7J! 

h *e case of ofter labels, such as radioisotopic la- ° f ' '* KffifffcSE? 

J± ^ anfl Tf lesCea ! Ubd » « «2yme labels. Mate- » '«dual biudinglhesS. 1b!Z£ 

gous means of duunguishir.g the udividual signi,"^ ?£ reafter ^ey are w «bed agair! ahosS 

one or from each of a pair of such labels are^so weU i< Phwpbate 

taown. For example two radioisotopes such as usj and ^waiting pl«e has in each of its wells two »™ 

"'1 may be readily distinguished on the basis of 2 ° far " ^proximately 1 mm*. MaZ^'^ 

The invention .nay be used for the assaying of ana 25 ^ ' * ^ """^ 

lytes present in biological fluids, for «TmTif ? 

body fluids such as btood, serum?"^^ Urine S EXAMPLE 2 

present artificially such a, ti °*- «*^£*^i£^ Z"? ^ 

contentt of which i are incorporated herein by reference nacrosco P t From the standard whtriowf ^ 

EXAMPLE 1 « 

An anti-TNF (tumour necrosis factort «,t,w^ w tnf eo»eo.tT,uon " rrrc fiu 0 r~~3T~ 

ing an affinity constant for TOT « r ^ Ked »— " 00 ™* "« 

1 X10» lisers/n^ole » «„£ ™J « £, It? Sf ~ 

Uontf the antibody a, . concentration of 80 tc^- 60 

grams/ml is fonned and 03 microliter aliquots of tht 70 

J^SS^^S^^ ™« and »°* HCG being as follows: 



O02 

02 *' 
7J 
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— : : • contmue< ! m lhe , urine of women in pregnancy testing, giving good 

HCGooncttirantm fttc Bnm^ correlations with results btainedby tenant tad 

±S -l-hd*-— ° a " CC ^ thieving effective concentration »e^eSe^ 

» £■ 5 ^^Zl^ aCa1natm . n ? gc ° Ttwoor orders 



of magnitude by correct choice of the best HCG spot 

and dose response curve. 
The artificially produced solution was found to give Production of labelled antibodies 

^ ™T l 1 * 1 ^ 10 " 5 00 *" The ^belling of the antibodies with fluorescent labels 

^ Immunology", Blackwcll Scientific Publications (1 980) 

EXAMPLE 3 M* 11-13, for example as foDows: 

Using similar procedures to those ouUined in Exam. « m ° noclo f aJ "tibody anti-FSH 3G3, an FSH 
pie la microliter pkte containing spots of labelled anti- 15 Wcafc^mbumi) antibody having an affinity con- 
T4 (thyroxine) antibody (affinity constant about ^IS, 1JX W htm mole . produced in 
IX 10" liters/mole at 25* C ), labelled anti.TSH (thy. ^ lddlcsex Hc »P>tal Medical School and was la- 

roid stimulating hormone) antibody (affinity constant I?™ ^ TRITC (rhodamine isothiocyanate) or 
about 5x10* liters/mole at 25* C) and labelled anti-T3 Texis Red ' a red fluorescence, 

(triiodothyronine) antibody (affinity constant about 20 Thc monoclonal antibody anti-FSH 8D10 a cross 
1 X .10" liters/mole at 25* C) in each of the individual reacting (alpha subunh) antibody having an affinity 
^ * P* 0 *™** the spots containing less than «»«■* (K) of 1x10" liters per mole, was likewise 
;£;JL n m , °f aatl ' T4 or ^ than produced in the Middlesex Hospital Medical School 

Thl " 25 ^ giving a yellow-green fluorescence. 

for™ e ^ ^ gCnCral pr0Ccdurc ^ -^55 -ites fluid 

a^yVon^ ^f\^n (ammonium sulphate precipitation and 

C This antibody is labelled wTti flJS^Sm S^ 10 ^^ by laW ^ aCCOrd ' 

The site-recognition reagents for the SHs *> 1^^°^ . B 
a£ T4 and T3 coupled to polylysine and labelled^ } sulphate purification 

FTTC and they recognise the unfilled sites on their , < , : i ^ saluratcd ammonium sulphate solution 
respective first antibodies. 10 3 ml utl preparation (culture supernatant or 1 ;5 

Using 400 microliter aliquot* of standard solutions dUutCd ascitcs fluid) undcr constant stirring (45% saw- 
containing various known amounts of T4,T3 and TSH 35 TZUon ^ 

dose response curves are obtained by methods analo-' 2 * Contijlue stir ™g for 30-90 min. Centrifuge at 2500 
gous to those in Example 2, correlating fluorescence for 30 

ratios with T4, TV and TSH concentrations. The plate is 3 * Discard the supernatant and dissolve the orecini- 
used to measure T4 T3 and TSH levels in serum from totc * PBS (final volume 5 ml.). Repeat Steps 1 and 2. 
human patients with good correlation with the results 40 OR. ^ A 

obtained by other methods. 4 Add u m i MtllMt .j 

Aaa mJ saturated ammonium sulphate (40% 
EXAMPLE 4 saturation) under constant stirring. Repeat Step 2. 

Using similar procedures to those outlined in Exan, toS^^T*"** 1 ^ *' pd,Ct " 

SiSS^^ 45 J" D ^ e » «*> against the same buffer 

ters/mole m 2T & aSd^SS aScG * *?S21h52 1'" ^ 

body (affinity constant about Ux 10«" liten/mole at Ji D T ctennjne * e Pf° leul concentration either at Aao 

25' C) and labelled anti-FSH (foSe stin^g ho " ? b T w 

mone)antftody(affiiih^coiistantab^^ *> ,7 J?!' Chroma,0 P»Phy: (Buffer: 1M Tris-CI. pH 

mole at 25' C) in each of the individual wells is pro- ^ $Ulpha,e) 

dnced, the spots each containing less than 0 1 V/K 8SCIles fluid by ""tnfugation at 4000 
moles of the respective antibody. A cross-reacting 

(alpha subunit) monoclonal antibody 8D10 with an Add ,M Tns " ci *ol»tion to achieve final concen- 

aflmhy constant of lx 10" liters/mole is used as a com- 55 ««*»of0.1M. 

rnon developing antibody for both the HCG and the 3 " Add svSTlcie ™ amount of solid potassium sulphate. 

FSH assays. Final concentration:=0.5M. 

Using 400 microliter aliquots of standard solutions 4 " A PP ] y °* 1"»d to the T-gel column 

crataining vanous known concentrations of HCG and 5 - Wash the column with 0. 1 M Tris-Q buffer contain- 

eSH, dose response curves are obtained by methods «0 m & 0.5M potassium sulphate, until protein profile fat 

analogous to those m Example X correlating fluores- A 280 ) returns to zero. P P ( ' 

cence ratios with HCG and FSH concentrations, the Elute the absorbed protein usine 0 1M Tri^l 

curve obtained with the higher affinity anti-HCGanti- buffer as the duant. * 

™^ concentration-sensitive results at the 7. Pool the fractions containing antibody activity and 

tion^ensitive at the higher HCG c^ncenuati^ The HWiT clES,^ uT * ™ 
Plate is used to measure HCG and FSH co^tSl 2^7^^^^^ 
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JJST C " UU8a ***** Uoa unconju - T ofsaid **** ^ » <°°£ 

^ ^ chromatogr^hy for TRJTC- "> to for the ana.ytes whose eoJS2STJiff£ 
/n j c iioci. tenmned 

B f"^ : . . »: A as claimed fa, claim 1, wherein said 

PBS for (a). binding agents are labelled with markers enafalinc *I 

° M 2i J^ 0, ^f te • pH 8 0 Md 018M Phosphate, ,. «»««»&» levels of said binding agents to be ma! 

pH 8.0 for (b). ** surra. 

11. A method as claimed in claim 10, wherein said 
2.nxon ,4 W ,» b » dm 8 agents and said site-recognition reagents are 

O-DJio mn — ujj x OJ3.495 nm labelled with fluorescent markers such that at the ir^i 

, n SpCtS Ule technique for measuring frac- 

1 ^ 20 U0 ? al o~upa»cy of the Wndikg agents memresfte 

1. A method for determining the ambient concentre- ?' ! he .»l»»l a emitted by the fluorescent markers, 

tions of a plurabty of analytes in a liquid sample of • f for 086 m determining the ambient con- 

volume V liters, comprising: centranons of a plurality of analytes in a liquid sample 

loading a plurality of different binding agents, each m £!j°JT Y J"?' "^P™** a solid support means 
being capable of reversibly binding an «aMe '°f ted thercOD * «k* bating density at a 
which is or may be present in the liquid sample and fi£2£ ap *? J™" *P°* ■ Pi""** of differ- 
Bspeoficforsaidanalyteascomparedtotheother reverS bfe^» b » d mg agent being capable of 
components of the liquid sample onto a SU m«« re yersioly binding an analyte which is or may be pres- 

tneans at a plurality of spacXapaTsnU 30 a? ^l*" 1 * 5pedfic •* «R 

such that each spot has a high ccL^hTS s^SKsnVS* * C ° mp0nenU of «•*» "fluid 
one of -id binding .genu but not moreZnVl ofa^de bS ^ ""'f^ 01 V/K «ol« 
V/K motes of binding agent are nr««t ™ °lr . bmdm / where K liters/mole is the 

said binding agent for said analyte: 35 n a w . . ? 11 14 specific 

such that only an insignificant proportion of any 40 «• A kit for use in determining the ambient eo™,™ 
analyte present m said liquid sample becomes ™ion of a plurality of 32k n? houfc nLSX 
bound to said binding agent specific for said ana- volume V liters, comprising: * ° f 

a parameter representative of the frac- ^^T^S^^^^ 
uonal occupancy by said analytes of said binding 43 spots a plurality of different birJmTagenS ^ch 

JS/'.lV? 01 ! ^ ' C ° mpctitivc « »°«~<*>- billdin f a 8« t ^8 «Pable of reversibly binding 

petrtive assay technique using . site-recognition M ""Jy* which is or may be present in said liquid 

reagent for each binding agent capable of recognii- """P'e a specific for said analyte as compared 

mg either the unfilled binding sites or the filled .„ 10 the other components of the liquid sample. «ch 

binding sites on said binding agent, said she-recog- spo1 ^ not more than 0.1 V/K moles of a 

moon reagent being labelled with a marker en- s £ gle bindin 8 agent, where K liters/mole is the 

ablmg the amount of said reagent in the particular 4 " uu,y con$tant of said single binding agent for 

location to be measured. reaction with the analyte to which it is specific: 

2. A method as claimed in claim 1, wherein each of « 8 plurah,y of standard samples containing known 
said spots has a size of less than 1 mm J . 55 concentrations of the analytes whose concentra- 

3. A method as claimed in claim 2, wherein each of UOnS f !" the ***** ""Pi* are to be measured; and 
said spots contains more than 10* molecule of bindms * ? «r? ed s,te - r «ognition reagents for reaction 
agent. 8 with filled or unfilled binding sites on said binding 

4. A method as claimed in claim 3, wherein each of m , • , • 

Zr* ,es *■ m v/K * 3pcu ^ r.^^^^ 11 "^ ~ h of ^ 

3. A method as claimed in claim 3 wherein said hind 17. A kit as claimed in claim 16, wherein each of said 

mg .genu used have affinity cTtiu f^d^es' m ° re thaa 104 »«>»ecules of binding 

of from J0« to 10" liters per mole. « * 
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ABSTRACT 



A method for determining the ambient concentrations of a 
plurahty of analyte* in a liquid sample of volume V li,ers 
comprises 1 

loading, plurality of different binding .genu, each being 
capable of reveisibly binding an analyte which is or 
may be present in the liquid sample and is specific for 
thai analyte is compared to the other components of the 
liquid sample, onto a support means at a plurality of 
spaced .pan locations such that each location has not 
more than 0.1 V/K. preferably less than 001 V/K, 
moles of a single binding agent, where K liters/mole is 
the equtlibrium constant of the binding agent for the 
analyte; 

contacting the loaded support means with the liquid 
sample to be analyzed, such that each of the spaced 
apart locations is contacted in the same operation with 
the liquid sample, the amount of liquid used in the 
sample bemg such that only an insignificant proportion 
of any aoalyte present in the liquid sample becomes 
bound to toe binding agent specific for it, and 
measuring a parameter representative of the fractional 
occupancy by the analytes of the binding agents at ihe 
spaced apart locations by a competitive or non- 
compeunve assay technique using a site-recognition 
reagent for each binding agent capable of recognizine 
either the unfilled binding sites or the filled binding 
sites on the b.nding agent, said site-recognition reagent 
being labelled with a marker enabling the amount of 
said reagent in the particular location to be measured 
A dev,ce and kit for use in the method are also 
provided. 



17 Claims, I Drawing Sheet 
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DETERMINATION OF AMBIENT „ , 2 

10 wbch is a monoclonal andbodvm.v , 8 * gem 

' This application is a «,«._.. .• equilibrium coisuol OCI wh.vt, , r , ejtln> P le . bm u 

praciice requires more ihan 10 7 i convention*, 

BAOCCOUND OF THE INVENTION "^SS 5 

live of the poS'S?™" 8 « q«3miiy represent*. „ hbe,, «J ".tibodies i, « co^ en , 10I S ,o ^ « rf 

bindine «enL ii iZl^f ° f ahsolme ^ouo, of kee l»»8 » cons. an, „ aI1 les ! "JrJ? r J Simple ' 
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amount of binding agent* Educed f 6 "" " S P J ^ "P™ one ScTi, i 

mficMiproporUoo of the anajyte * Km Jfg 0 £ d , %£ °»>y« sa»M number of differ nSTrliSs^! 

gcneraDy less Iban 10%. usually less iban 5% «nd for Hn,C markcr hbel throughout and to scan e*A J.^ 

«^^«^l«2»«3ri«5 fc 7»Xk5 S locate separately tfdeternLe^pSneTS 

r M e ° n,roUed ' voiuLe * UbeL B > « of „ e mveK^nS 

for all the liqwd simples erably more U»an 3 analyses can be performed vX^nrie 

Tff" ) - ' ^ te * * als ° P^We toTt£ e W,"f «*«> suppor. with liauW ?SZ£fc 

reliable «d sometimes even improved estimates of analv£ ? mple 10 ' 20> * 50 or eve ° «P » 100 « aWE2£ 

ooaeatnaoD using much less d» V/K mol« o7b£ E 10 ° f 

agent binding sites, say not more Iban 0 1 V/K anri „„f. r Overall, therefore the 

ably less than 0.01 V/K. For a bidding agent Ev£ ln ' method fo ' *termLg J^bLr» " ^ ' 

biers/mole and samples or approximately 1 ml size this is comprising: y ol ,Umt v tate,s - 

JTZSE ^JE""',? m ° re ,hM 10 "- P ref «blv " »°"iing « plurality of different binding — each h,- 
kss tbw 10 molecules of binding agent at each location in ca P» ble of reversiblv binding an anal™ S ° g 
an ndmdual array, tfu* vahe of K is 10» li.ers/mole the P resenl » ^ and fs^f " f y ta 
figures are 10« and 10* molecules respectively, and if K is compared «© ibe oUier componeSl bouM L"^' " 
of the order of 10 s bters/mole they are 10» and 10" , n ■ »PI»n means at . plurX ^^f so!cI?,n^ ? WDplC ' 00,0 
mokcute respectively. Below 10> molecules of binding " *« «eb location tau not more ffc ? v/KS" "T* 
agent at a single locauon the accuracy of toe measurement binding asent where K *iL. • ? Boles of * 

woukl become progressively less ,7 the faSSS «-« «* £ Ek^aftK !£ eqUiI, ' briU,,, 
p^ofmebrndrngagentsitesbytbeanalytewouldbeS contacting the loaded ZZ,„ ^ , 
tocb«ge only m d«re,e steps as individual sites become „ sample to £ 

occupied or unoccupied, but in principle at leas, the use of * locations is ewueSk *£ s Se „ ,, ?• "* • S f ICed * pan 

, ..... specific for u, and ^* 

reuesonlaigeamountsofbindiMafieCand^ X 40alyles of » e binding agents at the 

measurement on a ve^y small m, 0 f !»«dT IOD q SaBple ° f volun,e V li,e «. comprising « TsolS 

•mlysed. Simultaneous ex^urTorLh of ^ ^ ^ t ,0 I lh " ana,y,e as "^P*"* 'o the otheTcomDo 

poinu to the bouid u, be SZ t^t^ c^T' Z n° l ^ id >™*- «ch location 

•gen. spot to Ulce up the .MlyfcloTwbfcH L ltfi ? £p» „- h ^ Prtferabiy leSS ,baD 0 01 V/K - o?" 

•n extent (i.e. fractiona. binding £ 25-^) ~ - " ^KWiSiS 6 " K r' C,S/m0,e " ^ 

tion therein « Urge eniugn & ^2"; A " ' he me,bod acwdi -6 .o the invc.ion 

U. , Generally less A. S*. iSf £ ESS 55 SSTJ.^? aOC ° rdln6 L° a Sy of 

me.n.lv te «lK,u J1 d.otbepoint.mf r ,a iool , Din S 55 ^S2^± eoau,m !* known concentrations of the 

occupancy for each binding agent can then be deie™;^ ™ . J T T °° D0en,ri " ,o ns » we liquid sample are to be 

using separate site^ecognhion reagenB which ,11° Ured 3Dd 1 Xt oI ,abe,led si"-ecognition 

different binding agents and which m bbeUed^m markerl « i 

enabhng the concentrwion levels of tbe separTreaS ,h ""^ " ,be n,elbod of lhe I b»ve found 

bound to the different binding agents to be me«u7ed fo^ ' ^T"* Sp " kin 8. antibodies bavU ao ,S 

example fluorescent markers. Such me«„«m?nTmfv t k 0 " 5 '"' ? Ul ™'™* for an antigen, th^re htioS ^ 

jSTSfr"*^ for exan,p,e -^1-^*5 P b i::: oVtt coDcen, " ,ion a ° d ^ «^£ss 

scans across the support, or simultaneously, for examole « S ^ u E " " Dy paniculai an "8eD concen- 
usmg a photographic plate, depending on me nature oHbe a L ."J k relaUonshi P «««^» 'be ^tiolly con«n- 
labels. Other unagmg devices such as a television camlra 1 " a > " i""' 886 ° f "'^ ^ '° bin ^ 

S1 ,es at any particular antigen concentration follow the samf 
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cmves provided thai the .ntibodv conceoiriiions and ih, ,. 6 

siie-recogmuoo reagent ait light si^ak^! J* 
BRIEF DESCRIPTION OF THE DRAWING s ^SSTtS^ * S"*^ m EE! 

Tteprinciple undalying ft. method of .be invention mav de,ect «8 instrument ptoSfa i ic 1 S? te C ?l le< ? 0n , ^ ^ e 

prescribed antigen concentrations Tnrf ,k , • D markers a it«h^ a ? r *** P artlc ular marker or 

expressed in terms of K, namely 10/K, 1.0/K, 0 17K «•'« °' "Mlyte to be^™^ ^ K ,be COD «™n- 

0.01/K. The curves show that as TAbl f.ik V" ^ varv over rnr,«vl u, yed w ,he >"*nowo sample can 

DETAILED DESCRIPTION th 

Use of a porous support may cause ih, h.L- • techniques. The antibodi« y ? de by known 

depending on its ntoS,,?^' b ^«. affinity* constants for^aLt 5 ™^ ^ ""ventiowl 

5=^5t~ftK 1 ^-»S « 

obtained. Porous supports such b n ^ " 8 * *"cb bindinf acln* DU ° D be 

poly™, m tTii «W ,™ S ,LS fc?2SL2f ~-*» 

samples of .pprox.matelv^v™^^^ 8 « Tie binding agents may be applied to Ih 

for use w.tb , plurality of samples. pli " e A roughly constant small iSofSEL" 

When the support means is to be used in « presem m ,be «"rop becomes adsorbed nnL^ " 8 ' 8enl 
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density is generally desilXlo m ?,Lt ^ ^ UDg or " ^^^^^^Z^Lf^T 1 ,pott 
ratios/The s Ls of £ s™c .£.£!Sv M"^ sucb » HCG •"» ^^.^SZ^?™^ 
mm*, preferably less ^ j 3 10 be cross-re.amT «.«- ^ JTbH,^ 8 "I" 

spot or more. These suggested geometries cm nevertheless^ f .■ ."^ ****** lbc ^nab represcnutrvTof the 
be changed as requtred, being su5jec , ^ •J™ fr.ct.onal occupy 0 f the binding agcm^^lw 

Uonsonibe number of binding .gent nx.kcules m cacK. hv^^ C °^ Dtralions °' '«* *-5yJ^bIoSS5 

the inaunum volume of the sampie to which the ^fof 5 l^w " * • ^/^^ ~v« otauned femluS 
spots «11 be exposed and the mem locally .v^/for 10 JSP °'f U, ?8 kn0WD »«e«>tr.iions of KSJJ 

coovemenyy prep.nng „ .„„ of ^ £ fc ^Si^^^^XS^ *» 

°~*< b.ve been coated onto the ^su^^^ 

support it is convention* practice in w«h ih* • 7 ,„ J , : ^ cs ™ al Jng occupied bixiding sues fas with 

£HLi£ ! 0,ber Pf0,ein 10 all remainine °'b«- For gre.teV ^nTit^J^ of 

non-specific adsorpuon sites on the suppon and elseX? fraction which is th^£?& ttMM *? mMsure *« 
To confira, , tba, the amount of binding a^Tn. in aViS^l' occupant S , * * o^rET, *i Cb * nge " tnc ' 

spot will be less than the nujdJL^waS Si^a cue ' siIho »8 b '** SSionreSSSS K' " 

required 10 conlorai to the principle of the » 23-75% tiito .Ik™,*™ i.~ „r£-°™ ■■•<«» 



detectable marker of known specific aciWiv fi e km™ m,ensi, y of ««* signals from me ° f reU,,ve 

.mount of marker per uni, weigh, of bSg^enO ™ agent andSe o£ on ^ SS^ 0 " "f 

measuring the amount of marker present TW iffi . , 25 a *y ** "nied out by a laser saTn£» i!??^? °° re ag e «. 

bmdmg agent on be used to apply uolabelJed bindmg age n " SeT * ^ wl ? len Pb «*■ »o °* °f 

to the supports to be actually used. 8 a8 "" «« ,be amounts of fluorescence emitted W*r«oTv2t 

The minimum size of the liouid .«m„u /v ,• v • fl "°'««nce methods may also be used in.^L , 

analyte present m the bouid sample become, toSd?£ M fll°"t"° ( ^ fflade 10 redu « «• Discrimin.don of"e rwo 

as ^^fir=* - 

The site-recognitton reagents used in the m *ih~i > , u ' • y one flu °'««m label is present the «m» 

ing to me invention may^enSes^ bV annSl " , ' h^'^" may be Used ' P rovided c« Sn «> iTn 

labels such as fluorescein, rbodamine or Te«« vZT u, respective radtoacuve emiss ons. Likewise it £ ™« 

m.,eridsusabkmume.resolvedpXflJl^ D ^ " S^. 10 *"^ *e producu of two „^e «acuC 
as europium and other lamhanide cbeU^ °"° resceDOe suc b denvmg from dual enzyme-labeUed antibody count?.. 

rad.o.sou.p.c labels may be Used if appr^ "'"""t,'* diffe ™' chemi.uminescen, Hfcu^ 

* «Uon reagen. may be ^TSS pit»^ ^ 
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such is blood, serum, saliva or urine. They may be used for 
the assaymg f . wide variety of hormones, proteins, 
enzymes or other analjries which are either present narurallv 
■ the hqu.d sample ormay be present artificially such as 
drugs, poisons or the like. 

For example, the invention miy be used to provide a 
device for quantitatively assaying a variety of hormones 
relating to pregnancy and reproduction, such as FSH. LH 
HCG.prol.ctui and steroid hormones (e.g. progesterone' 
estradiol testosterone and androstene-dione), or hormonw 
of the adrenal pituitary axis, such is Cortisol, ACTH and 
aldosterone, or tbyroid-related hormones, such as T4 T3 
and 7SH and their binding protein TBG, or viruses such as' 
hepatitis, AIDS or herpes virus, or bacteria, such 
suphylococa. streptococci, poetimococci, gonococci and : 
efflercKXKQ, or rumour-related peptides such as AFP or 
CE* or drags such as those banned as illicit improvers of 
athletes' performance, or food contaminants. In each case 
the binding agents used will be specific for the analytes to be 
assayed (as compared with others in the sample) and may be 
monoclonal antibodies therefor. 

Further details on the meihodology are to be found in mv 
International Patent Publication WO88/01058, the contents 
of which are incorporated herein by reference. 
Toe invention is illustrated by the following Examples 
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amounung to about 400 microliters, is added lo one f the 
wells and allowed to incubate for several bona. About 4M 
microliters of vanous standard solutions containing known 
concentrauons (0.02. 0.2, 2 and 20 ng/ml) of TNT orHCG 
ate added to other wells of the phte „d akoTuo^ 
mcubaur for several hours. Tne wells are then washed 
several times with buffer solution. 

As sile-recogniiion reagents there are used for the TNF 
spoa iD joo-M antibody having u 3fimilv mDslull £ 
TNF a. 25' C of about lxKH»b tt rVmole and for the HCG 
HCG a, r HC r 3 i D,, ' b0d ^f vin 8 » «auV constat for 
" F£mPh ;?* ibOUt 1X1(H B ««»»le- Both antibodies 
a e labelled w, b fluorescein (FTTC). 400 microliter aliquo" 
of solutions of these labelled antibodies are added He 
wells and allowed to stand for a few hours. The weut ire 
tbeo washed with buffer. 

The resulting fluorescence ratio of each spot is quantified 
with a B.o-Rad Lasersharp MRC 500 confocal microscoj* 
From .be standard solutions dose response curves foTrW 
and HCG are butl, up, the figures for TNF being as foUows 



TNF coDceouatioi] 
ng/ml 



hii- fluoieieence 



EXAMPLE 1 

An anti-TW (tumour necrosis factor) aniibody having an 
affinity constant for TNF at 25° C. of about lxl f/i.w^Li 
is labelled with Texas Red. A soluti on^f' £^ y T t 
concentration of 80 micrograms/ml is formed and 0.5 micro! 
bter aliquots of this solution are added in the forHf 
dropleu one to each well of a Dynatecb Microfluor (o™ou° 
white) filled polystyrene microtitre pl.te having 12 well 

An anti-HCG (human chorionic gonadotropin) antibodv 
baiang , D affinity consum for HCG at 25' C. of about 6x10* 
l..ers/mole is also labelled with Texas Red. A solSo of he 
antibody a, a concentration of 80 micrograms/ml is formed 
and 0.5 microliter ahquots of this solution are added in the 
form of droplets one to each well of the same Dynatecb 
Microfluor microtitre plate. 

After addition of the droplets the plate is left for a few 
boms in a humid atmosphere to prevent evaporation of the 
droplets Dunng this lime some of the antibody molecules in 

.« H^A T idSOlbed ° Wo ,he Ne«'. 'be w" U s 
are washed several umes with a phosphate buffer and ih en 

sr ?f ibou ' 400 »**wrx 

soluuon and eft for several hours to saturate the residual 
fuffe^ Tbm *" **> " ^ «*> 
Tbe resulting plate has in each of its wells .wo spots eacb 
of area approximately l mm*. Measurement of the amount 

about 5x10* molecules of anti-TNF antibody and the oiher 
chains about 5x10* molecules of anti-HCC antibody Xe 
weUs are designed for use with liquid samples of volume 
400 microliters, so that 0.1 V/K is 4xlO- J4 ™i, 

EXAMPLE 2 

A microtitre plate prepared as described in Example 1 is ts 
used in .«yfcr« artificially produced solution con 
fining TNF and HCG. A test sample of the solution. 



0.02 
0.2 
2 
20 



1.1 
4.6 

7.9 
423 



30 and those for HCG being as follows: 



HCG csoceotnttoo 
og/ml 



35 



40 



FTTC OuoTwcDcc 
J Mas Ked i'luoretcence" 



od HOC spot 
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20 



lJ 

7.2 
16.0 
28J2 



50 



55 



60 



Tbe arnficully produced solution was found to give ratio 
"admgs of 5.9 oo the W spot and 103 on the HCG sJoT 
corre aung well wnfa the actual concentrations of TNF (OS 
og/ml) and HCG (0.5 ng/ml) obtained from tbTdos! 
response curves. 

EXAMPLE 3 

Using similar procedures to tbose outlined in Example 1 
a m,cro...re plate containing spots of labelled anti-T4 
(thyroxine) anubody (affini.y constant about 1x10" liters/ 
mole a. 25' C.) labelled anti-TSH (thyroid slLS 
hormone) anubody (afBoi.y constant about 5x10 s liters/ 

C) a ° d UbeUed im ™ ('-odotbyroS 
an .body (affini.y constan. abou. 1x10" liters/mole a. 25' 

C.) id eacb of the individual wells is produced, tbe soots 

ESS 'I 0 *" Vmolesof an,i - T4 

lxiu V moles of anti-T3 antibody. 
The developing antibody (si.e-recogni.ion reagent) for the 

\12 ? « '« er s"°°l< « 25' C. This antibodv is 
labelled with fluorescein (FITC). The site-recognition 
reagent, for .be TA and T3 assays are T4 and T3 coupkd to 
poly-lysine and labeUed with FITC, and ,bey recognise the 
unfilled sues on .heir respective firs, antibodies. 

Using 400 microliter aliquots of standard solutions con- 
taming various known amounts of T4. T3 and TSH dose 
response curves are ob.ained by methods analogous to .bos* 



ii 
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ioi E»mple 2, correlating fluorescence ratios with T4 r* 
and TSH concentrations. Hie plate « ».«^ i I 4 ' 73 
•«a tcu ii- p**ic is used to measure T4 T* 

EXAMPLE 4 

Using similar procedures to those outline* ;„ c 
• microutre plate containing st»teoffi™T!f ™ 1 
antibody (affinity constant «h«If « ?n» {'Wed «nti-HCG 
C). secLd UbiSK£ ^ " 250 
about UxlO" liters/mole a 1 £ ™ WiSS"? °° nSUl " 
(follicle stimul.tmg bormane) amiboo\ iaffiU'i w>ti-FSH 
about 1 JxlO» litersAnole a. § " S2ch J"?™ 
wells is produced, the spots each <-o ! md,v,dna ' 
V/K moles of t^S^tS"? ^ ttl 
(alpha subunit) «o^£SZ£S^fJT' K ^ 
consunt of 1x10" Uters^kTuse^ « . c^S"? 
oping antibody for both the HCG^o Ite KH ajaays* 

Using 400 microliter aliquots of si»nrf,^^i 
Uining various known oiSStaJS HC^TfSH 
dose response curves are obtained hv m-.k T* , FSH » 

higher affinity anti-HCG antibody eivinc m„ 

means and achievi n£ effect^ J!r *"" cd ^ 0,her 
for HCG over a con^S SS? ~ T 15 
of magnitude by correa choice Tin" £ u C C " ^ 
dose response curve. HCG S P° [ and 

Production of Labelled Antibodies 
be^X.^^^ 

«E£ PuWlC4,ions (1980 >- «- sss 

tJxlC liters per m2e w5££?? C, ? ,ttw (K > of 

Toe monoclonal antibody anti-FSH 8nin . 
reacting (alpha subunit) 801^0^ .0 .* °' CTOSS- 
(K) of 1x10" liters per mok?wXew£ l^," y ?^ ao1 
Mglese, Hospital^., 'SS^Z^SSj^ 

TlJc general procedure used involved ascii« c 
cation (ammonium sulDhm* ^™ ,fic / lcs flu »d purifi- 

chromatography) fafoud bv ^hln P,Ull ° D and ^ 
followin^teVs: X UWlW8 ' *> tbe 

l.a. Ammonium suJpbate purification 
saturation). 0nSUD ' SUmn 8 

3. Discard the supernatani and dissolve ih* 
PBS (final volume 5 m,.). *S£ 
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50 



4. Add 3.6 ml saturated ammonium sulnhait una. 

1 Jb. T-gel Chromatography: (Buffer: 1M Tris-d „w 7 * 
Solid potassium sulphate) M ins-u. pH 7.6. 

1- Char 2 ml of asci.es fluid by cen.rifugation ., 4000 
2 ' JSJS. 1 * 0 ^ 10 ichieve fi "« concentration 

J- Apply the ascite fluid to the T-gel column. 

3. wash the column with 0 1M Tri*^\ «■ 

3. Ma aod incubate a. A' C. for 16-18 hrs 

f Tepbaix £ 0 Is" en' 611 £r0m "^ated by: 

iepbadex G-25 chromatography for FTTC label. 

" laSf * SePbaCe ' cbronl «°g»Pby for TRITOFTTC 

Buffer system: 
PBS for (a). 

0.005M Phosphate, P H 8.0 and 0.18M 
Phosphaic, pH 8.0 for (b). 

aieuuuo. of rrrc-. p roieio coupiil)g n(io . . 
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EXAMPLE 4 

Regents 

5 Phosphate buffer, 0.1M, pH 7 4 

7 Wash buffer: Phosphate buffer 0 1M nH i a 
0.1% Tween 20 L OM^™'*™'™""'* 
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8 Blick miciDlitre strips from Dynatech 

9 Supeifilock from Pierce 

A. Protocol and Conditions for (be Radioimmunoassay of 
Tbyroid Stimulating Hormooe (TSH) 

1. Ad aliquot of 50/d of 50^ anti-TSH monoclonal 
•ntftody in phosphate buffer wis added to microtia 
wells and incubated for 1 hour at room temperature 

Z Tbe niicrouire wells were washed with phosphate 
buffer, blocked with SuperBlock for 30 iff,, 
room temperature and then washed again 
3.*, aliquot of lOO/d of TSH standards made up in 
T S -^ e ,i5 uln < to > M *»! concentrations of 0 

£ 0 m'/iV' 4Xi r- ** 10r °' 12x10 " 9 ' "xlO- and' 
, ™ nkn 2 wn sen,ID Mffl P>« and 100 ^ of 
'"labelled TSH m Tris-HCl assay buffer were added 
to triplicate anti-TSH monoclonal antibody coated 
microutre wells, shaken for 1 hour a. room 
temperature, washed with wash buffer and counted for 
radioactivity. The concentration of TSH in the 
unknown samples can be read from the standard curve 
The incubauon period of 1 hour for the assay is far less 
than the ume required for the binding reaction to go to 
equ ilibnum, but, provided the standards are measured under 
the same conditions, the unknown sample can be measured 

5 i 1 Dda / dS - ^ effec,ive affini «> constant for 
U* anubody will of course be that which pertains after! 
hour tncubauon and under the same conditions as the Issa.v 

B. Procedure for Obtaining the Affinity Consiani K „f 

JSEEa?^^ •SmunoS: 
say Performed Under the Conditions Described in (A) 

1. An^iquot of 50 Al of 50*g/ml anu-TSH monoclonal 

room temperature and then washed again 

3. An aliquot of 100 fd of TSH standards made up i, 

^ 2^^ <*£"^' concentrations of 0, 
1x10" , 2x10" , 4x10-". 8x10-", 12xl0- e IfixlO" 9 

Trn-HQ assay buffer were added to triplicate antibodv 
coated rn.crot.tre wells. shaken for 1 hour at ro7m 
temperature, washed with wash buffer and counted 
radioactivity. tur 

4. A standard Scatcbard plot of Bound/Free vs. Bound 
TSH was used to obtain the affinity constant K for the 
monoclonal anti-TSH antibody 

SfS ^ *° Amounl of Capture Antibody 

§0.1 V/K and Deposited on the Solid-Pbase as Microspots 
Since the assay volume V is 0.2 ml or 2*10"< L and the 
affinity constant K of the anti-TSH «pmm antibody „i 0 
u^rco„d,,K,ns described in (B) was found to beiS 

allowed m the assay under ambient analyte condition 
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aspirated instantly. This procedure resulted in antibodv 
mjoospotswiOt. coated „ea f approar^SylS 



Molar .nwua of coved .nobody n okropo. 
- 1-7 * KT I4 M 
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-0-1 V/K 

- (0.1 n 2 * 10-*VJ.J « jo^M 

- 1 £ * 1CT U M 



Or a capture antibody concentration of 9X10 30 M/I 
Assay Protocol: /L ' 

1. A 0.5 /il droplet of a monoclonal anu-TSH caoiure 
200 „g/ol was added to each mic roii.re well and 
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or a capture antibody concentration of 0«xl(T 10 M/L 
2. The microtitre wells were w«h^ «x»k J T" 
buffer and the unreacted^ SSS ££3SS 
for 30 minutes at room temperature and th^sn wished 
again with phosphate buffer * SOed 
3. 1 00 of TSH sundards (made up in TSH-free serum) 
or unknown samples plus 100 Ll of Tri< uo j 
buffer were added"* ^ 
forU,our a. room temperature and washed with w!^ 

as mjcrospot on the solid-phase. An aliqu^ofYooS 3 

intib ° dy " Tris - HC1 «"y bX wL 
added to the microtitre wells, shaken for 1 ho U r,i ™™ 

£t WMbed ^ WSSb ^scated 1 ^ 
amou^ i^fl SCa ° n,n8 C0Dfocal microscope and the 
^1° "/fl-orescence on the microspoW and tfc 
amount of fluorescence on the microspot quantified 

were read from the standard curve 
Although, for the purpose of illustration the 
constant of the antibody^ JS£"£L thelS 
condiuons. in practice, in many cases it m» n«, L y 
sary actually ,o perform such L^t 
obv,ous ? hav ">E regard to the details of the «^ in 
quesuoo ma, the amount of capture antibody ^Tany 
spot is going to be less than 0.1 V/K. y 
What is claimed is: 

aD^fn? f °. r de,enninin 8 "»b.ent concentration of 
an analyte of interest amonc a oluraliiv of ,«■„. ■ 
liquid sample of volume V L^^'^ m » 

' 0a iSed P w^h7 ° f f ffere °' bind ^g agents, each being 
labelled with a marker and being capable of reversihW 
binding an analyte which is or may be present uTThe 
bquid sample and is specific for said ana££ as ~™ 
pared to the other component of me g 0 «n?p"; 
onto a support means at a plurality of spaced Toar 
small spots such .bat no. more than 0.1 vffif 
binding agent are present on any spot, where K liters/ 

s^'a^te;^ 0 ^ ^ " ^ >*™<« 

C0 ^ ra C l i r, e , ,h l l 1Oad , ed SUPP ° rt mMDS liquid 
sample to be analyzed, such that each of the spots s 
contacted ,n the same step with said liquid sample the 
amount of bquid used in said sample being S£ 
only an insignificant proportion of any analyte present 
« . «,d liquid sample becomes bound" to said b'inXg 
agent specific for said analyte; B 
contacting the support with a site-recognition reagent 
specific for each binding agent in a competitive o 

^nT°« n P h!" V r ,eChDiqUC ' ,hC site -"»Eni.ion reagent 
being capable of recognizing either the unfilled binding 
sues or the filled binding sites on said binding agenf 
said sne-recognition reagent being labelled with a 
marker different from the marker on said binding .gem 
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measuring a rata of signals from s.id marker on tbe site 
recognition reagent and u, e bindin reieen , 
k«»P«n f the spot, from which ^ , 0 
interest is deiennioed 10 

atulytes in a liquid sample of V l^lo^^ ° f 
(a) loading a plurality of different binding • 
being capable of reversfcly binding an AS 
or may be present ffl , he u ■„ * - * _* 
for said aoalyie as comoared m .h, ;»{.. s P ec >nc 

bindmg agem for said anaJyic* 

specific for said analyie; and ° 8 agenI 

(c) thereafter contacting t'bt loaded support with sile 

agent, «x ^-recognition ££"*g 
markers from which the fractal bJingZ o£u 

ts are "iJi ^ fl "^n^e s r reco8n,tioD 

si^gnTutrlag^rn EE"" " 

determined consecutively TOpCCUVe b,nd,n 8 »8«> « 

7. Tbe method of claim 4 when-ir. 
stte-recogniuon reagents™ eacIrSl £ E^"" ° f ** 
determined simultaneously P blndln 8 *8«' " 

je defined va,ue of fc ^^^^1 
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loadug a plurdtty of different bioding agen*. each bo. 
capable f«versfclybmdi n gTi^ w St , 2 
«.y be present in the liquid Lp^issSc f* 
*a«d analyte as compared to tbe oiwJlT^ . 

spaced apart small spots such that «5.wE?Zf 
co.Ung densiry of one of said bindmg 21 but ^ 
more man 0.1 V/K moles of bindmg SlS 

stant 0 f Mld bmdmg agent for said analyie ^ 
contacting the loaded support means with '.k. r 

&3S533S 

contacting the support with » . . 

measuring the signal from .be marker of tbe site 
recogmuon reagent in a panicular loca"on to «2L ,t 
presence of said pluraluv 0 f analv^in^J". , 
10. A method as claimed in chin « u Mlnple ' 
spots has a si* of ,T£L W * ^ "* °' 
U A method as claimed in claim 10 wherein «,+ „r 

•s si's?- r ™ S: - 35s ™ 

10 8 to JO 33 liters oer mole. *n*Jyies of from 

14.AmeibodasclaimcdiDclaiinll wh ere in 
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f57J ABSTRACT 

Tbc present invention provides methods for H,. - • 

concentrai oo of analvt« in i ^ • j determining the 

amount of bw^^i^^^ » the 
given analyte in the ^ ^2^^?^ 4 
zone on a soiid simoon ih* k™? '^mobilized in a test 

concentration of toe analyte J ?* CS,20De - 

occupied binding .gent S a ^^ blCk * ti,r * ,in 8 ,be 
marker and n ' m blv »« » 

array. Tbe pm^lSK^T^ location inL 
determining . v,i ue repreimaut STfl' "V*"* tot 
sues of tbe binding ageniwhirh * ,CUon of bindi og 

comprising i«5tSM^*«-l»i! 
»hdsuppon,wb erein ^ s r d ^^. b »°*»8 «geot on , 
fractional occupancy is D r«In, 4,18 18eo ' f ™ 
V/K moles, IT"* leSS "»» °1 

binding to tbe binding T E «t ?J i M, ? te 

binding agent is dividel info 0 ' £L J«? ' he s ^ 

locattons; contacting tbe a^^,?^ W™* 

coniactagih eS upp or , v ^, h , h p j^" 7 ,Ul . "* b< ' u,d sample; 

non-specificaljy ffiS£^5M«* 

signal at each of the location?^ *1 " d me " u ™g Ibe 

represent me fracuon onbe b^ no ° bU,B ' V4,Ue wbicb 
analyte a. eacb location l^?' 6 ""tf* by ibe 
Provide a total s.Jn ,7^2, f,^ 8 ^ v » lu « "> 

binding sites of tbf bLdiSf ' r8C,ion °' 
Test kits and devices used in „ * ,^ P '! d by ,be 
also disclosed. W P r4CUc ">8 ibese methods are 

21 Claims, 3 Drawing Sheets 



-U.S. Patent Nov. v, ms 



Sheet 1 of 3 _ 5,837,551 



Sensitivity 




Area of microspot 

Fig.1. 



Signal/Noise 
ratio 



0 



Area of microspot 

Fig. 2. 



_ U 'S- P»telrt Nov. 17, 1998 



Sh* t3 of3 _ 5,837,551 



o 


o 


0 


o 


o 




o 


o 


0 


o 


o 




o 


o 


0 


o 


o 
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minimicrospot radius = r 
number: N 
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Fig.5. 




Equivalcnf microspot 
microspot radius r r = r/N 
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HELD OF THE INVENTION 
The present invention , e j iIes , 0 biwJin 
de.enmn.ng ,„e concentration of aoalytes in liqu£sampl« 
BACKGROUND TO THE INVENTION 

„ 11 ii k, !7 D ,0 k mMsnrc 106 concentration of an analvte 
such is a drug or hormone in a linmH crr-i. u ua, - v «. 

me liquid wit! a ^'SjaSS! 

bound to it and measunng a vaJue representative of ih! 
proportion of the binding sites on the £ 
occupied by analyte (referred to as the fractional 
occupancy) Typ.cally, U,e concentration of the ,ti™ 
bquid sample can then be determined bv «*mS?£ e 
fraatonal occupancy ag.^, vaJues fro P 

standard solutions containing known coocentrationTof 

In the past, tbe measurement of fractional occupancy has 
usuaDy been earned out by baek-titrauon with a S£ 
develop** reagent using either so-called competWve or 
non-competitive methods ^ " e or 
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rypically a labelled ^^^3 ffi.S* h 
agent can be said to compete for SSSS!K 
binding agem with tbe iniJvic who« • ■ . 

measure fraction o?Z Z£2 
occupied with tbe labelled anajyte can then be related^ Z 

jng .gent capable of binding to either me boS anil* or" 
he occupud binding sites on the binding ag«! The fra t 

by detecung the presence of the labelled developing 
and. jus. as witb competitive assays, related to t£^n« D 
trat*„ of the ana.yte in fc liquid Mnlple „ J™ 

In both competitive and noncompetitive methods the 
developing agent is labelled with a marker A varierv of 
markers have been used in the past, for 

« ni r;;r m "' Cben,au -««»« and flares' 

h.t !ll e „ fie,d f j nfflUD0MS »y. competitive immunoassays 
have in general been earned out in accordance with S 
pnncmles enunciated by Berson and %fc^~ toS 

(1973), pages 111 to 116. Berson and Yalow proposedTal 

jo 100% of the analyte to tbe bquid lofpk How e v « ^ 

» t kS^T eplS ^ V0luffle of «°« sarnie 
lo be known and the amount of binding agent used to be 
accurately known or known to be constant 



effect on the concentration of the analjae in the Toukl 
sample, „ B found Ui| ffle fr 

binding sites on the binding agent by the an.C is effec 
uvely mdependent of the volume of L saToY* 
This approach is further refined in EP3O4.202 whirf, 

amou^of binding agent - W^^SSSS 
form of a microspot. In this assay, i develop 
composing a microsphere containing a marker e e a toZ 
ram dye, is used to back-titrate U? e b^agem .fie? h 

a«ly^ n As°,be SSt"" ^^^J! 
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SUMMARY OF THE INVENTION 



45 



'- 50 



55 



60 



65 



ki, fcr 11 a T pr0vides 8 ««»d. device and test 

SlSS"* ° m ' hM f 6 aSSay » wbicb binding age^ 
havmg bmdmg sites specific for a given analyte in a houW 
sample is ^mobilised in a test zone on a sohd I «£p«?S2 
bmding agen. being divided into an may of s^tiaS 
40 SX .' i0DS " "* MDt ' -berem L conS 

a me-h^f 0 ^' ™ aSpeCt ' ,he P resenI ^""O" Prides 
a U D u h rtJ° r f ,ernjlDUJ 6 ibe concentration of an analyte " 
a liquid sample comprising: in 

^.h-n D . 8 . biDding i8e0 ' b3Vin8 biadiD 8 specific for 
/en, ^ y V , ! S !. 20De ° D 3 ^PPon, thVbindin^ 

(b) contacting tbe suppor. with the liquid sample so that 
a fracon of the binding sites a, each locatioVbTcome 
occupied by analyte; uccome 

^frTctionSh 3 h^"- ° f 3 SigDa ' of the 

lracuon of tbe binding sues occupied by tbe analyte for 
each mdiv.dual location in the array 
(d) integrating tbe signal value obtained for each location 
m the array to provide an integrated signal; and 

vS o^ '"h f me8ra,ed &ignal 10 ""responding 
values, obtamed from a series of standard solutions 
ooouutug known concentrations of analyte, to deTe,! 
sample C ° nCen,ra,i0n of ,he »-»'yc in the liquid 

from ™ P rT°' ^"'i 00 * ,ne values «be signal 
from an array of locations in the test zone are used to 
determine the concentration of a single analyte TbTS E 
contrast to .be approach described in EPaSSi. to wS5 
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when u, o^^^ 

The array of locations of binding ,gen. in (be .est ^ne can S^^^^^^^^^S^M 
be viewed sequentuUy, e . g . ^ ^Z"" mie T° s P 01 kid f, 10 « reduction in tbe sensitMty of ^ 

binding .gem in the test tone can be viewed together" f h ,oul «>««» »«• occupied bySe 

Zfi ' ? bU8 ; ^ ^CD) earner? X the TZTcZTV ^ ^ ,0,aJ ^^f bTnd£ 

signal values from eacfa location being measured simulto *f -L ^ Mme - 

neously. ra sunujia- i n addition, the use of an array of ' 

to*^*^***********,.^ 10 ^^^^^^^^^^^ 

binding sites occupied by binding agent at each location « 2? ° bUmcd for «V given microspo, is in error by 

measured by backdating , be binding agent JZTZ*. ZZV^ I * 

opag «en. baying , marker> me ^ pi J"f^ ter «Pf«.tbe present invention provides a device 

capable of bindmg ,o unoccupied binding sites, bound ,« ! tt r ,M 8 , be concentration of one or more an»]y£sfc 

««Jy.e or to occupied binding sites in a compel or h,!™ SlmpiC ' devke «»>P™ng a solid su™£ 

non^compeutrve method, as described above. ^ " ^llrl " 20Des - «« S 

iJ^™*" ° n "* ^r 1 ^^ ™» ^ a radioactive ZtS "7 " am0U '" ° f bindin 8 «««< b.ving bin*™ 

isotope, an enzyme, a cbemiluminescent marker or a fiuo h ^ ^ for a S™" ,M, 3* » liquid sample the 
rescent marker. The use of fluorescent dye markers ^ ,„ ^ f" b " n * dWded in "> « ™v ol 

espcc".y preferred as, he fluorescent dyes c» K e «ed tkln^'of^ ^cations in the test zone, wherein ^bc^corK^ntra^ 

o prov.de fluorescence of an appropriate colouTlZl "Z< V P™"**' « obtained by mtegrating*Sl 

(excitation and emission wavelength) for detection . f°? MCb 1 ° C " ion » Me 

rescent dyes include coumarin, fluorescein, rbodamine and J" " - J***"' Ule presen ' invenlion Prides a lot for 

Texas Red. Fluorescent dye molecules ba'ving ^oTo "d 2J SS^^T™*" ° f TOe OT «"« •«>>£». 

fluorescent periods can be used, thereby allowmg Urn" * T' d "f' 8 ' Be 101 comprising: ^ 

resolved fluorescence to be used to measure the strength of W * devioe eons P*ing a solid support having 0De or 

iS'T' ** Ml f ter background fluorescent bL ttl "ch test zone h.Wng »S 2 

SoMh^rt roirker can «* incorporated " ".J™?™ ° f biodin 8 °«ving bindinTsi.« 

within oron the surface of latex microspheres attachedto the 30 ! P '? fic for * »«wlyie in a liquid samnk t£ 

developmg agent. This .Uows a large quantitv of mTrker to blDd,ng 4 8 enl bein 8 divided into an a«y of sp^tianv 
be «»cuted w,tb each molecule of deve'lopinT «ent ,0ca,ions in •« «>»e° «d P ' 

« y r. r ^ e,led - 10 45 " n »i-™«ospots" in the relevant wberein «* concentration of a given analvte i, oh ., in h k 
pans of the description that follow. reievani jm gi « analyte u ^obtained by 

The presen, invenUon also .Hows Ute concentration of a a * enl al «ch location in the iTay ^ ° f deVe '° p,0g 
provai 0 . of ^st^tcht 1 ^^ BR,EF DESC «^ON OF THE DRAWINGS 

Mt^r.^^iis^s- « a ^-=~^^ 

associitioD consunt for analvte bindino ^ m. i. j- Pir ■> 

Thisensuresmattbe-ambknTaS ratfo « .h^'^? ,ypiCal Varia,i0D m «gnal-,o-noise 

in WO83/01031 are fidfiuj I rSX^nK.tT^ nr ? ° f ' n " CI0Sp01 Cban 6«; 

centrauon. gW<UeSS ° f ^ . . F J G " 3 re P re *«* bo* diffusion constraints on analvte 

^&reir^ ^it^^ -» u, Slg na, measurement 

m the case of rmcrcspote ,yp ia „ y idj ^ 0|s ^ a D ^ ""^W m,CrOSp0, 
diameter of about 80 an. Alternatively, if jLTr location! «o „f 5 ° WS * con3 P ariso " ""ween the microspot arr.v 
'J"**** «o be used ,0 *cUuo, £ 60 ^.' b aDd PreSem ' nVemion and a "Microspo, oHh p" o y 
anwun. of btndtng agent immobilised a. . location on a p"^ h u . 

P^ 0 "- .. FIG - 6 shows bindi "8 agent immobilised as an arrav of 

JZZT" " ,VeDUOOiS ^ °° Nation tha, as l,0eS " i0 al,erDa,ive "bodiment of the invention ' ° f 
Z^lttttttj&S^-* " DETAILED DESCRIPTION 

— by »e -.^n 
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pbyaco^micil properties (k association and dissociation 
rile constants) of the binding agent, the viscosity of the 
inilyie containing solution to which the microspot is 
exposed, the specific activity of the label used etc 

In all the figure* value A denotes the are. of 'a microspot 
typically used in the pnor an (typically l i D al ,X 
figures, the density of binding'ageni is kept^nstan, 

FIG. 1 snows the experimentally observed variation of 
sensrtmty as the are. of a microspot is reduced. In the 
present context^ sensitivity can be denned as the lower limit 
of detection which is g,ven by the error (s.d) with which i 
is possible to measure zero signal. As FIG. 1 shows, as the 
area is reduced from value A, the sensitivity of the binding 
assay reaches a maximum and then declines as the area of 
the microspot is further reduced towards zero 

M J°™ °l '*? b «°* lc »di°g «o this observation 

are depicted id FIGS. 2 to 4. 

FIG. 2 shows how the signal-.o-noise ratio associated 
XVl m f asUremem ° f ,he occupancy of the binding sites 
of the binding agent changes as the size of the microspot 

CI ! 1 f S / 0W Mro ' assumiD 8 equilibrium has been 
reached. As microspot area is reduced from value A, the 
fractional occupancy of the binding sites of the buTdine 
«8"J reaches a plateau rvalue as the concentration of bindinf 
agent falls below 0 01/K. Therefore, the signal per unS 
from markers on developing agent used to measure the 
occupancy of the binding sites by analvte wfl] also reach a 
plateau. As the background noise per" unTarefremauJ 
approxunately constant, so the signal-.o-noise ra Uo w1£ 
hkew.se mcrease to a plateau value as the concenuation of 
binding agent falls below 0.01/K. 

FIG. 3 shows how diffusion constraints change as the area 
of a microspot is reduced. "Diffusion constraint!" restrict the 
rate a. which analyte migrates towards and binds to be 
binding agent. As FIG. 3 shows, the diffusion consul 
hJ^n* " nUC ' 0SP ?' size r d «"««. ie the kinetics of the 
,h« ^ Pr !? SS " C faS,W ,' or smaUer i-icrospois, implying 
ioreup?d. 0 ? yDanUC eqmUbriUm ' ,be « » ""S3 
On a molecular level, this phenomenon can be pictured as 
follows. When a microspot containing binding age", s 
a P ''" d * a «n.ple containing analyte. the binding 

« *\ ^'t^ ,be tocal """nation of 2 
analyte as compared to the liquid sample as a whole This 
leads to a concentrauon gradient being established in the 
reS ° f T^ ""W? thermodynamic equilibrium i 
reached. This process >s found to be slower for larecr 
rmcrospots the diffusron constraint being approxinJeN 
proporuonal to microspot radius. When the occu£,nS °f 
bindmg sites on the bmding agent has reached an Tquil b! 
rrum value, rtie concentrauon of analyte in the liquid simple 
is unrform However, equilibrium is reached morc rapTdTv' D 
the case of microspots of smaller size, implying that for anv 
incubauon Ume less man that required u/reach equilS 
u. the case of the targerspot. the fractional occupancy of Zc 
binding sites on the smaller spot is greater. 

However, as microspot area decreases, so the amount of 
toding agent and the level of signal from developing agen 
will likewise decrease. This leads to an increase in the 
suusucal errors in the measurement of the signal from a 
marker on a developing tgew> whjch , en(J B ^ 
microspot area tends to zero (see FIG. 4). 

«.!i , fnH b S * en " M ' ' oon ? ider » li on of the signal-to-noise 
ratio and diffusion constraints indicate an increase in the 
sensitivity of a binding assay as the area of a microspot is 
decreased. However, these factors are opposed bVan 
increase in the statistical error of signal measurement as the 
microspot area decreases. These factors combine to produce 
the observed variation of sensinvity with microspot area 
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shown in FIG. 1. Thus, the overall consequence is that „ 

buiaing assays using microspots of the smallest pcJblelize 
containing vanisbingly small amounts of bitxJkg^enL t£! 
have^pid kinetics to minimise the time Ukenl^S 

hin*n» PreSeD ' • iDV 5 n " on ^oves sensiuvity and reduces 
binding assay incubation times by exploiting the cW.dk! 
tory effects discussed above to maxiial aoVanta« TwL 
done by subdividing the total amount of b^KenTmto 
an array of spatially separated locations such a! "mint 
microspots", to reduce diffusion constraints, and integral 
the signals representative of the fractional occupanc/of 

5^ ^ 'PhT " u Ch 1OCati0n 10 obuin * ""»1*3 greater 
than would have been achieved bv using a sinelt micros^ 
equal , D area to the total are, o«upfed by tne S 
crospots comprismg the minimicrospot array 

a«m^ P l ieS, i mer t Ua ' lhaI the of binding 

agent used can be made even smaller than in the prior arl 
where a balance between kinetics and signal ,o^oi£ K u 
tive to stausucal errors had to be made t optimTse^ 
uwry The present invention therefore can improve fce 
sjgnal-to-noise ratio associated with measuring^ TatLlwe 
bound to b.ndrng agent, whilst reducing thTKon^! 
straints assocated with each microspot in the .«y 

errors observed 
pnor an as microspot size is reduced are obviated as the 
signal generated from the occupied binding sit« bV.ntly^ 
» .the indivdual microspot is integrated over meVrrcy* 
prov.de an integrated signal, thereby retaining the K 
measurement advantage observed for larger nicrospoT 

cJ fl2 h ST 1 " h ° W 3 siD 8fc microspot of & prior art 
can be d.v.ded mlo an array of 25 microspots conlamine an 
equ.valeni total amount of binding agent aa """ niB & 

Nevertheless, other arrangements or geometries of bind 
uig agent providing assays yielding the sL,e benefiu! can be 
eovsaged, see for instance FIG. 6 wWch Z££ Wndi™ 
agen. ,mmob.lised as lines forming a grid (Le^e shaS 
.h^-tt "^""on likewise has the effect of reducing 
.be diffus.o D cons.ramts whilst maintaining the to^ «ef 
coa.ed wab bmding agent (e.g. an antibody) to ob^a.eT 
uicreasiog staiawcal errors and associated IcL o S*£ 
observed as tbe amount of binding agent is reduced y 
Tbc amoun. and dis.ribution of the binding agent in the 
ocauons comprtsmg .be anay depends on% I" ie' v of 
facors mcludmg .he diffusion characerisucs of tbe ^Balvte 

n e a M mre H D K V1SC0Si,y ° f ^ ^ ^ »nuin"g^he 
analy.e and the protocol used during incubation. However 
given .be guidance here Uk skilled person can readSy 
de.erm.ne enher experunen.ally or by computer modeff 

Sr^aia v 3Dgen,em " Ee0n,e,ry ° f ^ f ° r «» 

EXAMPLE 

Coojugaiion of Anti-TSH (Anii-Thyroid 
Sumulaimg HonnoDc) Mouse Monoclonal 
Antibody to Fluoresces HydrophUic Laiex 
Microspheres 



0 5 mi Hn hi ^ U ° r n S ? D ' b - vdro P«>flic l»tex microspheres in 

TWEEN an s rf S " Ued Wa ' er WCre 4dded 10 °- 5 °* of 1% 
TWEEN 20, surface-acuve agent, shaken for 15 min at room 
.cmpera.ure and cenirifuged a. 8' C. for 10 min a. 20 «» 
rpm 10 a MSE High-Spin 20 ultr.centriruge 

tlM m!. ^"r' T" d,spersed in 2 ml of 0.05M MES 
(2-[N ; Morpholino] etbanesulfonic acid) buffer. pH6 1 and 
cenu-ifuged. «ua 



5,837451 



3. Step 2 was repeated. 

4. The pellel wasdispersed in 08 m j MES buffer 
»in « room taJLS? «" ^ for 15 
shaken for 2 hours „ ^ aod 

- i£SS£ tTi 0 ^^ added - 

8. T*c pefle, was di££??2 J 
Serum Albumin), shakenfor ] hn„, .7 BSA 
cenirifiiged. " " t00a "nnpennire and 

9. ThepeUetwtsdjspeisedin2mJofl%n<:A .i. i. r 
1 tar room temperature .^SS^^* 

11- Step 10 was repeated twice 

Comparison of Kinetics of Micro v,f«, e - 
("V"* Stimulating Hormone) Assay 

MiaoFtaor Microtia we?£ «£L" Dyna,ech black 
immediately, the wells btkS ^? c* WMe aspira,ed 
Pierce for 30 min a, Lm^l^ Su P" Block ^m 
0.1M phosphate ^^4 ^ W8Sbcd 

solution Jy^KK aD f am, '- b0d - V 

mately 100 p, picolner *tt? r tt&Z7* 

S-Pe^ckaod^ blocked 
coaled antibody densi.y fejfffi? 85 above - ^ 
are estimated to be *Jr£S£F° ^ m °"«»<CTOspois 

■ ^et^^^^ - add « d 

Al 30. 60. 120 min and 18 t^Z ? " "?? ,en, Pe»nire. 
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faster kinetic for ib^SSaS" of T^"^ 
antfcody, and couW b^S to £ ™ ^ 
The invention S£j£ 

each anafy te . i^^T ""^ "^i" 8 ' for 

wherein ^spec*: ^^i^tl""™- 
the concentration of the analvi' u Ermine 
less than 0.1 V/K mote 3£ v'T' "1 " amoum 
bquid sample and K i^.! V ° luoe of lbe 

agent, and wherein said Lctfch?^ b ' Dd,n8 
divided into an array of L^.i ' Ddm8 a * enI is 
w «u array oi spatially separated location.- 
(b) contacung the support with ih, , cal,ons * 
fraction of lbe bi«S of ' mplt I S ° ,bat a 
age. specific for 

^^^^ 
labelled developin?L P nS g ,J DaXkerSucb lbat *e 

st.es wnb specifically bound anl tt b,DdlDg 

^cedby,bemSrea cbo 7^ S r B ,bE SigBaJ pro " 
■o obtain' , v,l«1^^ , 2^ l »»» a,Tay 
binding sites occupied bvT^!^. 1 " 6 ir ' aiOD of tbe 
(e) adding tbe vaiuesobtau^ a J MCb 
«o provide a total signed l0CaU ° DS m ™y 

(0 sr:rei r^r^ 6 

known concecation! o ^ d n%"t°r, COmainiD8 

>n of a pjuraliiv of difW„, .^..1 wtc t reiD . ^ conccrj. 
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25 



washed with phospbate.TWEOJ 2n ^ ""P 8 " 1 "" and 
ihen scanned wi^ . ..ZT^ ?° buff «- ^ ^ were 



U»e„ scanned^ a u^r^" 3" ^ weUs — 
equiped with an Argo./Kryp^T^ ^ "^P 6 



Results 


iscufaatioB 


Tottl Fluonsccal Signal 
(irbiiwy t.«;.-\ 
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5. Tbe method .'ccori^g o c ,^1 ' ^f 0 "" 
spedfic binding agen, is a?an^body and tta "Z^ ^ 
antigen or (ii) lhe spccific binding b an or- ' 6 * ? 
Ode and we analyte is a nucleic acid °''8°nucle- 

each analyte. the s'tej^' ' mC ' b ° d ^^P^ing. fo, 
(a) immobilizing a specific binding agent inclurfi„„ k- a 
tog sues specific for the analvte OD ' ' ;h 8 
wherein the specific binding age nl Jjt l «™W>n. 
the concentration of tbe analvte i!„ determine 
less .bar, 0.1 V/K v^lS " am f 0Un, 

liquid sample and K is the 1 , ' UDe ° f lbc 

analyte specifically SSLT^^r^*- 
agent. and wherein said snecifir \ ^ DdlDg 

.pecific for the ana.yte 
(c) contacting tbe support with . rf,. • y ' 
..helled with a signa^rodu^g LSSSMj 
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labelled developing agent binds to unoccupied binding 
ales, to specifically bound analyie or to the binding 
sites with specificaDy bound analyte; 
(^separating non-specificaDy bound developing agent 
from tbe sobd support and measuring the signal pro- 
duced by tbe marker at eacfa of tbe locations in tbe array 
to obtain a value which represents the fraction of the 
binding sues occupied by tbe analyie at each location- 
and ' 

(e) adding tbe values obtained at the locations in tbe array 
to provide a total signal which indicates tbe concen- 
tration of the analyie in the liquid sample. 

7. The method according to claim 6 wherein the specific 
bmding agent is divided into between 4 and 40 locations 

8. Tne method according to claim 6, wherein the locations 
are in an area of about 10000 m \ the locations being 
separated from each other by a distance of 100 to 1000 ™ 

9. Tbe method according to claim 6, wherein the conoo-' 
trauons of a plurality of different analytcs in tbe liquid 
sample are determined using a plurality of arrays on said 
support. 

10 - "H*: n ? elhod ^"fog to claim 6, wherein (i) the 
specific binding agent is ao antibody and ibe analyte is an 
antigen or (ii) the specific binding agent is an oligonucle- 
otide and tbe analyte is a nucleic acid. 

11. A method for determining tbe concentration of at least 
one analyte in a liquid sample, said method employing a 
sobd support on which is immobflijed, for each analyte a 
specific binding agent including binding sites specific for the 
analyte. wherein toe specific binding agent used to deter- 

less than 01 V/K moles, where V is the volume of the liquid 
""^ •*> K b the association constant for the analyte 
specifically binding to tbe specific binding agent, and 
wherein said specific binding agent is divided into an array 
sttpTof Separ3,ed ,oc,lions ' "id »«hod comprising the 

(a) contacting tbe support with the sample so that a 
fraction of the binding sites of the specific bindinc 
agent specific for tbe analyte specificaDy binds the 
analyie; 

(b) contacting the support witb a developing agent 
labelled with a signal-producing marker such that the 
labelled developing agent binds to unoccupied biodine 
sites, to specifically bound analyte or to the bindine 
sites with specifically bound analyte; 

(c) separating non-specificaUy bound developing agent 
from the solid support and measuring tbe signal pro- 
duced by tne marker at each of the locations in tbe arra v 
lo obtain a value which represents tbe fraction of the 
binding sites occupied by tbe analyte at each location- 

(d) adding tbe values obtained at the locations in tbe arrav 
to provide a total signal; and 

(e) comparing Ibe lota] signal to corresponding values 
obtained from a series of standard solutions containing 
known ooncenuations of the analyte. to determine the 
concentration of tne analyte in tbe liquid sample 

12. Tbe method according to claim 11. wherein tbe 
specific binding agent is divided into between 4 and 40 
locations. 



1DL 



10 



15 



30 



35 



40 



45 



ay 50 



55 



60 



13. Tne method according to claim 11, wherein the 
locations have an area of about 10000 m \ the locations 
being separated from each other by a disUnce f 100 to 1000 

14. Tbt method according to claim 11, wherein tbe 
concentrations of a plurality of different analytes in the 
liquid sample are determined using a plurality of arrays on 
said support. 

c J^Jt C J? Cth0d accordin & lo claim U, wherein tbe 

Imjcn g agCDI * " anlibod y and *** * " 

16. The method according to claim 11, wherein tbe 

frin'on Tw^ * citr ?™* » value representative of a 
fraction of binding sites of a specific binding agent including 
binding sites specific for ao analyte wbichbinding sites iSf 
occupied by the analyte present in a liquid sa^ saS 
method comprising the steps of: 

(a) immobilizing tbe specific binding agent on a solid 
support, wherein the specific binding agent used for the 
tracuonal occupancy determination is present in an 
amount less than 0.1 V/K moles, where V is the volume 
of the liquid sample and K is the association constant 
for the analyte specifically binding to tbe specific 
binding agent, and wherein said specific binding agent 
is divided into an array of spatially separated locations- 

(b) contacting the support with the liquid sample so that 
a fraction of tbe binding sites of tbe binding agent 
specific for the analyie specifically bind the analyte; 

(C) i ^u 13 /" 06 thc Sup P ort wilh a developing agent 
abe ed with a signal-producing marker such that the 
labelled developing agent binds lo unoccupied binding 
sites, to specifically bound analyte or to tbe bindinn 
sites with specifically bound analyte; 

(d) separating non-specificaUy bound developing agent 
trom the solid support and measuring the signal pro- 
duced by tbe marker at each of tbe locations in the array 
to obtain a value which represents tbe fraction of the 
binding sues occupied by the analyte at each location- 
and * 

(e) adding the values obtained at thc locations in the array 
to provide a total signal which indicates the fraction of 
the bmding sites in the specific binding agent occupied 
by the analyte. 

18 Tbe method according to claim 17, wherein tbe 
specific binding agent is divided into between 4 and 40 
locations. 

19. Tbe method according to claim 17, wherein the 
locations arc in an area of about 10000 ^m 2 , the locations 
being separated from each other by a distance of 100 to 1000 
//m. 

20. The method according to claim 17, wherein the 
fraction of occupied binding sites is determined for a phi- 
rabty of different analytes in the liquid sample using a 
plurality of arrays on said support. 

21 The method according to claim 17, wherein (i) the 
specific binding agent is an antibody and the analyte is an 
antigen or (ii) the specific binding agent is an oligonucle- 
otide and tbe analyte is a nucleic acid. 
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IN THE UNITED STATES PATENT AND TRADEMARK OFFICE 



DECLARATION OF TOD BEDILION, Ph.D. 
UNDER 37 C.F.R. § 1.132 



I, TOD BEDILION, Ph.D., declare and state as 

follows: 

1. In April, 1996, I became the first employee of 
Synteni, Inc., where I served as Research Director until its 
acquisition by Incyte Corporation in early 1998. After 
Synteni 1 s acquisition, I continued in the position of Director 
of Corporate Development at Incyte until May 11, 2001. I am 
currently the Director of 'Business Development at Genomic 
Health, Inc., Redwood City, California and an occasional 
Consultant to Incyte. 

2. Synteni was founded to commercialize expression 
microarrays, microarrays in which expressed nucleic acids — 
full-length cDNAs, fragments of full-length cDNAs , expressed 
sequence tags (ESTs) — are arrayed on a common support to 
permit highly parallel detection and measurement of the 
expression of their cognate genes in a biological sample. 

3. During my employ at Synteni, virtually all (if 
not all) of my work efforts were directed to the further 
technical development and the commercial exploitation of that 
microarray technology; given rhe small size of our shop, most 
of us had both technical and commercial responsibilities. The 
customer accounts for which I was personally responsible 
included large pharmaceutical companies, such as SmithKline 



Beecham, large biotechnology companies, such as Genentech, a 
small research institutes, such as DNAX i„ c . 

<• From my » ery first interaction wlth ^ 
customs consistently through to SyntenLs acquisition by 
incyte, I heard uniform, consistent, and emphatic requests 
that more genes be added to the arrays. This was true with 
respect to both our original microarrays, based on customer- 
provioed genes and libraries, and our later, ..generic", „ 
express.cn microarrays, based upon the unigene clone ' 
collection (our so-called » D „i G e m .. arrays) . From day 1 che 
pressure on us was to nr,'^ ~ 

was to prxnt ever more spots on the array. It 
was gee a question: our custQmers wanted J 

the array, each new gene-specific probe providing 
incrementally more value to the 'customer.'' 

5. As a commercial enterprise, providing value to 
our customers was our m=n*~ 

was our major concern. Thus, to increase the 
value of our Drodurt*; ar.^ ~~ 

products and services in the marketplace - to 

increase our ability to sell our microarrays and microarray 

services, their •liability" - our efforts from the very 

beginning were devoted to increasing i-v>« u 

increasing the number of specific 

genes whose expression could be detected with our microarrays. 

6. indeed, one of our major competitive advantages 
in tne marketplace — not •;„.<- . . 

not just as regards other commercial 

suppliers, but also with respect to the innumerable 
laboratories and companies that were attempting t0 sp ot arrays 
-2^1^^""" ^=iHties - was the number of 

encoded product ^VktU^T^l Lk'ST" 1 'S"" 10 " of < h » 

and all expressed oenes. ' «kin s for probes specific to any 



distinct gene-specific probes that we provided on our 
expression microarrays . Our first 10,000 element UniGem array 
put the holy grail of gene expression analysis - the human 

whole genome array - within sight for the very first time 
(with respect to timing of the UniGEM program we began project 

planning and technology development in mid 1996 and delivered 

our first 10,000 element standard content human arrays in the 

first months of 1997 as I recall) . 

7. By the end of 1997, our efforts to provide the 
most comprehensive, and thus most valuable, human gene 
expression microarrays had been sufficiently successful that 
incyte agreed to acquire Synteni for a reported $80 million. 

8. I declare further that all statements made 
herein of my own knowledge are true and that all statements 
made on information and belief are believed to be true, and 
further that these statements were made with the knowledge 
that willful false statements and the like so made are 
punishable by fine or imprisonment, or both, under 
Section 1001 of Title 18 of the United States Code and may 
jeopardize the validity of any patent application in which 
this declaration is filed or any patent that issues thereon. 



Tod Bedilion, Ph.D. Date 
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DECLARATION OF TOD BEDILION, Ph.D. 
UNDER 37 C.F.R. § 1.132 



I, TOD BEDILION, Ph.D., declare and state as 

follows: 

1. In April, 1996, I became the first employee of 
Synteni, Inc., where I served as Research Director until its 
acquisition by Incyte Corporation in early 1998. After 
Synteni 's acquisition, I continued in the position of Director 
of Corporate Development at Incyte until May 11, 2001. I am 
currently the Director of 'Business Development at Genomic 
Health, Inc., Redwood City, California and an occasional 
Consultant to Incyte. 

2. Synteni was founded to commercialize expression 
microarrays, microarrays in which expressed nucleic acids — 
full-length cDNAs, fragments cf full-length cDNAs, expressed 
sequence tags (ESTs) — are arrayed on a common support to 
permit highly parallel detection and measurement of the 
expression of their cognate genes in a biological sample. 

3. During my employ at Synteni, virtually all (if 
not all) of my work efforts were directed to the further 
technical development and the commercial exploitation of that 
microarray technology; given rhe small size of our shop, most 
of us had both technical and commercial responsibilities. The 
customer accounts for which I was personally responsible 
included large pharmaceutical companies, such as SmithKline 



Beecham, large biotechnology companies, such as Genentech a 
small research institutes, such as DNAX Inc. 

«• From my very first interaction with our 
customs, consistently through to Synteni's acquisition by 
incyte, I heard uniform, consistent, and emphatic requests 
that more genes be added to the arrays. This was true with 
respect to hoth our original microarrays, based on customer- 
prov.aed genes and libraries, and our later, "generic", oene 
express^ microarrays, based upon the unigene clone ' 
collection (our so-called -UniCm" arrays) . Fro* day 1, the 
pressure on us was to print evermore spots on the array !t 
was never a question* r^r- 

q Stl ° n ' OUr c "stomer s wanted ever more genes on 

the array, each new gene-specific probe providing 
incrementally more va i ue to the customer.' 

S. As a commercial enterprise, providing value to 
our customers was our m aj or concern. Thus, to increase the 
value of our products and services in the marketplace - to 
increase our ability to sen n„r. 

y to sen our microarrays and microarrav 
services, their "salabilltv" ~. 

w . . saxaoiizty — our efforts from the very 

beginning were devoted to increasing the number of specific 
genes whose expression could be detected with our microarrays. 

6. indeed, one of our major competitive advantages 
in the marketplace — nor ■;„... . . 

not just as regards other commercial 
suppliers, but also „ ich respect „ the Xnn ^ ble 

laboratories and companies that were attempting to spot a-rays 
^^^^~»" Cities - was the number of 

encoded 9 e». product wj kno™° bS Srf Ltt°'°f Cil f " nc "°° ° f 

and all expressed genes. """" for Probes specific Co any 



distinct gene-specific probes that we provided on our 
expression microarrays. Our first 10,000 element UniGem array 
put the holy grail of gene expression analysis - the human 

whole genome array - within sight for the very first time 
(with respect to timing of the UniGEM program we began project 

planning and technology development in mid 1996 and delivered 

our first 10,000 element standard content human arrays in the 

first months of 1997 as I recall) . 

7. By the end of 1997, our efforts to provide the 
most comprehensive, and thus most valuable, human gene 
expression microarrays had been sufficiently successful that 
Incyte agreed to acquire Synteni for a reported $80 million. 

8. I declare further that all statements made 
herein of my own knowledge are true and that ail statements 
made on information and belief are believed to be true, and 
further that these statements were made with the knowledge 
that willful false statements and the like so made are 
punishable by fine or imprisonment, or both, under 
Section 1001 of Title 18 of the United States Code and may 
jeopardize the validity of any patent application in which 
this declaration is filed or any patent that issues thereon. 



Tod Bedilion, Ph.D. Date 



IYER DECLARATION 

Docket No.: PF-0300-3 CON 
USSN: 09/745.506 



follows : 



IN THE UNITED STATES PATENT AND TRADEMARK OFFICE 



DECLARATION OF VISHWANATH R. IYER, Ph.D 
UNDER 37 C.F.R. S 1.132 



I. VISHWANATH R. IYER, Ph.D., declare and state as 



1. I am an Assistant Professor in the Section of 
Molecular Genetics and Microbiology, Institute of Cellular and 
Molecular Biology, University of Texas at Austin, where my 
laboratory currently studies global transcriptional control in 
yeast, gene expression programs during human cell 
proliferation, and genome-wide transcription factor targets in 
yeast and human. Immediately prior to this position, I spent 
four years as a postdoctoral fellow in the laboratory of 
Patrick 0. Brown at Stanford University studying the 
transcriptional programs of yeast and of human cells. My 
curriculum vitae is attached hereto as Exhibit A: 

2. Beginning in Dr. Brown^s laboratory, where I 
helped to develop the first whole genome arrays for yeast and 
early versions of highly representative cDNA arrays for human 
cells, and continuing to the present day, I have used 
microarray-based gene expression analysis as a principal 
approach in much of my research. 



3. Representative publications describing this 
work include: 



DeRisi J. et aJ . , "Exploring the metabolic and 
genetic control of gene expression on a genomic 
scale," Science 278:680-686 (1997) ; a 9 enomic 

identif^- 6 ' *l" " DrU9 tar9et ^^ation and 
identification of secondary drug target effects 

(1998)T micr ° amvs <" mature Med. 4:1293-1301 

Iyer et al., -The transcriptional program in 
the response of human fibroblasts to serum - 
Science 283:83-87 (1999) ; 3 and 

Ross et ai., "Systematic variation in gene 
expression patterns in human cancer cell lines ■ 
Nature Genetics 24: 227-235 (2000). 4 

Two of the papers describe our use of microarray-based 
expression profiling to explore the metabolic reprogramming 
that occurs during major environmental changes, both in yeast 
(DeRisi et al., during the shift from fermentation to 
respiration) and in human cells (Iyer et al . , human 
fibroblasts exposed to serum) . One reference describes our 
use of expression profile analysis in drug target validation 
and identification of secondary drug effects (Marton et al ) 
And one describes our use of expression profiling as a 
molecular phenotyping tool to discriminate among human cancer 
cells (Ross et al . ) . 

4. Whether used to elucidate basic physiological 
responses, to study primary and secondary drug effects, or to 
discriminate and classify human cancers, expression profiling 



Attached hereto as Exhibit B. 
Attached hereto as Exhibit C. 
Attached hereto as Exhibit D. 
Attached hereto as Exhibit E. 
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as we have practiced it relies fnr ii- e 

it relies for its power on— comparison of 
patterns of expression. 

5- For example, we have demonstrated that we can 
use the presence or absence of a characteristic drug 
"signature-, pattern of altered gene expression in drug-treated 
ells to explore the mechanism of drug action, and to identify 
econdary effects that can signal potentially deleterious drug 
ide effects. As another example, we have demonstrated that 
gene expression patterns can be used to classify human tumor 
cell lines, while it is of course advantageous to know the 
biological function of the encoded gene products in order to 
reach a better understanding of the cellular mechanisms 
underlying these results, these pattern-based analyses do not 
require knowledge of the biological function of the encoded 
proteins . 

6. The resolution of the patterns used in such 
comparisons is determined by the number of genes detected- the 
greater the number of genes detected, the higher the 

resolution of the nattpm Tf ^ 

ne pattern. it goes without saying that higher 

resolution patterns are generally more useful in such 
comparisons than lower resolution patterns. With such higher 
resolutions comes a correspondingly higher degree of 
statistical confidence for distinguishing different patterns 
as well as identifying similar ones. 

1. Each gene included as a probe on a microarray 
provides a signal that is specific to the cognate transcript 
at least to a first approximation. 5 Each new gene-specific 

5 In a more nuanced view it- i e t 

s i9 „ al th e presence o £ e ^ $ I'l^nT. 
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probe added to a microarray thus increases the number of genes 
detectable by the device, increasing the resolving power of 
the device. As I note above, higher resolution patterns are 
generally more useful in comparisons than lower resolution 
patterns. Accordingly, each new gene probe added to a 
microarray increases the usefulness of the device in gene 
expression profiling analyses. This proposition is so well- 
established as to be virtually an axiom in the art, and has 
been as long as I have been working in the field, and 
certainly since the time I embarked on the production of whole 
genome arrays in early 1996. Simply put, arrays with fewer 
gene-specific probes are inferior to arrays with more gene- 
specific probes. 

8. For example, our ability to subdivide cancers 
into discriminable classes by expression profiling is limited 
by the resolution of the patterns produced. With more genes 
contributing to the expression patterns, we can potentially 
draw finer distinctions among the patterns, thus subdividing 
otherwise indistinguishable cancers into a greater number of 
classes; the greater the number of classes, the greater the 
likelihood that the cancers classified together will respond 
similarly to therapeutic intervention, permitting better 
individualization of therapy and, we hope, better treatment 
outcomes . 

9. If a gene does not change expression in an 
experiment, or if a gene is not expressed and produces no 



(-Continued) 

without discriminating among them, and for a probe to signal the presence 
of a variety of allelic variants of a single gene, again without 
discriminating among them. 



signal in an experiment, that is not to say that the probe 
lacks usefulness on the array; it only means that an 
insufficient number of conditions have been sampled to 
identify expression changes, m fact, an experiment showing 
that a gene is not expressed or that its expression level does 
not change can be equally informative. To provide maximum 
versatility as a research tool, the microarray should 
include - and as a biologist I would want my microarray to 
include - each newly identified gene as a probe. 

10. I declare further that all statements made 
herein of my own knowledge are true and that all statements 
made on information and belief are believed to be true, and 
further that these statements were made with the knowledge 
that willful false statements and the like so made are 
punishable by fine or imprisonment, or both, under 
Section 1001 of Title 18 of the United States Code and may 
jeopardize the validity of any patent application in which 
this declaration is filed or any patent that issues thereon. 
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Exploring the Metabolic and Genetic Control of 
Gene Expression on a Genomic Scale 

Joseph L DeRisi, Vishwanath R. Iyer, Patrick O. Brown* 

DNA microarrays containing virtually every gene of Saccharomyces cerevisiae were used 
to carry out a comprehensive investigation of the temporal program of gene expression 
accompanying the metabolic shift from fermentation to respiration. The expression 
profiles observed for genes with known metabolic functions pointed to features of the 
metabolic reprogramming that occur during the diauxic shift, and the expression patterns 
of many previously uncharacterized genes provided clues to their possible functions. The 
same DNA microarrays were also used to identify genes whose expression was affected 
by deletion of the transcriptional co-repressor TUP1 or overexpression of the transcrip- 
tional activator YAP1. These results demonstrate the feasibility and utility of this ap- 
proach to genomewide exploration of gene expression patterns. 



The complete sequences of nearly a dozen 
microbial genomes are known, and in the 
next several years we expect to know the 
complete genome sequences of several 
metazoans, including the human genome. 
Defining the role of each gene in these 
genomes will be a formidable task, and un- 
derstanding how the genome functions as a 
whole in the complex natural history of a 
living organism presents an even greater 
challenge. 

Knowing when and where a gene is 
expressed often provides a strong clue as to 
its biological role. Conversely, the pattern 
of genes expressed in a cell can provide 
detailed information about its state. Al- 
though regulation of protein abundance in 
a cell is by no means accomplished solely 
by regulation of mRNA, virtually all dif- 
ferences in cell type or state are correlated 
with changes in the mRNA levels of many 
genes. This is fortuitous because the only 
specific reagent required to measure the 
abundance of the mRNA for a specific 
gene is a cDNA sequence. DNA microar- 
rays, consisting of thousands of individual 
gene sequences printed in a high-density 
array on a glass microscope slide (J, 2), 
provide a practical and economical tool 
for studying gene expression on a very 
large scale (3-6). 

Saccharomyces cerevisiae is an especially 
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favorable organism in which to conduct a 
systematic investigation of gene expression. 
The genes are easy to recognize in the ge- 
nome sequence, cis regulatory elements are 
generally compact and close to the tran- 
scription units, much is already known 
about its genetic regulatory mechanisms, 
and a powerful set of tools is available for its 
analysis. 

A recurring cycle in the natural history 
of yeast involves a shift from anaerobic 
(fermentation) to aerobic (respiration) me- 
tabolism. Inoculation of yeast into a medi- 
um rich in sugar is followed by rapid growth 
fueled by fermentation, with the production 
of ethanol. When the fermentable sugar is 
exhausted, the yeast cells turn to ethanol as 
a carbon source for aerobic growth. This 
switch from anaerobic growth to aerobic 
respiration upon depletion of glucose, re- 
ferred to as the diauxic shift, is correlated 
with widespread changes in the expression 
of genes involved in fundamental cellular 
processes such as carbon metabolism, pro- 
tein synthesis, and carbohydrate storage 
(7). We used DNA microarrays to charac- 
terize the changes in gene expression that 
take place during this process for nearly the 
entire genome, and to investigate the ge- 
netic circuitry that regulates and executes 
this program. 

Yeast open reading frames (ORFs) were 
amplified by the polymerase chain reaction 
(PCR), with a commercially available set of 
primer pairs (8). DNA microarrays, con- 
taining approximately 6400 distinct DNA 
sequences, were printed onto glass slides by 



using a simple robotic printing device (9). 
Cells from an exponentially growing culture 
of yeast were inoculated into fresh medium 
and grown at 30°C for 21 hours. After an 
initial 9 hours of growth, samples were har- 
vested at seven successive 2-hour intervals, 
and mRNA was isolated (10). Fluorescently 
labeled cDNA was prepared by reverse tran- 
scription in the presence of Cy3(green)- 
or Cy5(red)-labeled deoxyuridine triphos- 
phate (dUTP) (J J) and then hybridized to 
the microarrays (12). To maximize the re- 
liability with which changes in expression 
levels could be discerned, we labeled cDNA 
prepared from cells at each successive time 
point with Cy5, then mixed it with a Cy3- 
labeled "reference" cDNA sample prepared 
from cells harvested at the first interval 
after inoculation. In this experimental de- 
sign, the relative fluorescence intensity 
measured for the Cy3 and Cy5 fluors at 
each array element provides a reliable mea- 
sure of the relative abundance of the corre- 
sponding mRNA in the two cell popula- 
tions (Fig. 1). Data from the series of seven 
samples (Fig. 2), consisting of more than 
43,000 expression-ratio measurements, 
were organized into a database to facilitate 
efficient exploration and analysis of the 
results. This database is publicly available 
on the Internet (13). 

During exponential growth in glucose- 
rich medium, the global pattern of gene 
expression was remarkably stable. Indeed, 
when gene expression patterns between the 
first two cell samples (harvested at a 2-hour 
interval) were compared, mRNA levels dif- 
fered by a factor of 2 or more for only 19 
genes (0.3%), and the largest of these dif- 
ferences was only 2.7-fold (14). However, as 
glucose was progressively depleted from the 
growth media during the course of the ex- 
periment, a marked change was seen in the 
global pattern of gene expression. mRNA 
levels for approximately 710 genes were 
induced by a factor of at least 2, and the 
mRNA levels for approximately 1030 genes 
declined by a factor of at least 2. Messenger 
RNA levels for 183 genes increased by a 
factor of at least 4, and mRNA levels for 
203 genes diminished by a factor of at least 
4. About half of these differentially ex- 
pressed genes have no currently recognized 
function and are not yet named. Indeed, 
more than 400 of the differentially ex- 
pressed genes have no apparent homology 
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to any gene whose function 5" known (75)* 
The responses of these previously unchar- 
acterized genes to the diauxic shift therefore 
provides the first small clue to their possible 
roles. 

The global view of changes in expres- 
sion of genes with known functions pro- 
vides a vivid picture of the way in which 
the cell adapts to a changing environ- 
ment. Figure 3 shows a portion of the yeast 
metabolic pathways involved in carbon 
and energy metabolism. Mapping the 
changes we observed in the mRNAs en- 
coding each enzyme onto this framework 
allowed us to infer the redirection in the 
flow of metabolites through this system. 
We observed large inductions of the genes 
coding for the enzymes aldehyde dehydro- 
genase (ALD2) and acetyl-coenzyme 
A(CoA) synthase (ACS J), which func- 
tion together to convert the products of 
alcohol dehydrogenase into acetyl-CoA, 
which in turn is used to fuel the tricarbox- 
ylic acid (TCA) cycle and the glyoxylate 
cycle. The concomitant shutdown of tran- 
scription of the genes encoding pyruvate 
decarboxylase and induction of pyruvate 
carboxylase rechannels pyruvate away 
from acetaldehyde, and instead to oxalac- 
etate, where it can serve to supply the 
TCA cycle and gluconeogenesis. Induc- 
tion of the pivotal genes PCKl, encoding 
phosphoenol pyruvate carboxykinase, and 
FBP1, encoding fructose 1,6-biphos- 
phatase, switches the directions of two key 
irreversible steps in glycolysis, reversing 
the flow of metabolites along the revers- 
ible steps of the glycolytic pathway toward 
the essential biosynthetic precursor, glu- 
coses-phosphate. Induction of the genes 
coding for the trehalose synthase and gly- 
cogen synthase complexes promotes chan- 
neling of glucose-6-phosphate into these 
carbohydrate storage pathways. 

just as the changes in expression of 
genes encoding pivotal enzymes can pro- 
vide insight into metabolic reprogram- 
ming, the behavior of large groups of func- 
tionally related genes can provide a broad 
view of the systematic way in which the 
yeast cell adapts to a changing environ- 
ment (Fig. 4). Several classes of genes, 
such as cytochrome c-related genes and 
those involved in the TCA/glyoxylate cy- 
cle and carbohydrate storage, were coordi- 
nate^ induced by glucose exhaustion. In 
contrast, genes devoted to protein synthe- 
sis, including ribosomal proteins, tRNA 
synthetases, and translation, elongation, 
and initiation factors, exhibited a coordi- 
nated decrease in expression. More than 
95% of ribosomal genes showed at least 
twofold decreases in expression during the 
diauxic shift (Fig. 4) 113). A noteworthy 
and illuminating exception was that the 



genes encoding mitochondrial ribosomal 
genes were generally induced rather than 
repressed after glucose limitation, high- 
lighting the requirement for mitchondrial 
biogenesis (J 3). As more is learned about 
the functions of every gene in the yeast 
genome, the ability to gain insight into a 
cell 's response to a changing environment 
through its global gene expression patterns 
will become increasingly powerful. 

Several distinct temporal patterns of ex- 
pression could be recognized, and sets of 
genes could be grouped on the basis of the 
similarities in their expression patterns. The 
characterized members of each of these 
groups also shared important similarities in 
their functions. Moreover, in most cases, 
common regulatory mechanisms could be 
inferred for sets of genes with similar expres- 
sion profiles. For example, seven genes 
showed a late induction profile, with mRNA 
levels increasing by more than ninefold at 
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the lastjimepoint but less than three/ Id at 
the preceding timepoint (Fig. 5B). All of 
these genes were known to be glucose-re- 
pressed, and five of the seven were previously 
noted to share a common upstream activat- 
ing sequence (UAS), the carbon source re- 
sponse element (CSRE) (16-20). A search 
in the promoter regions of the remaining two 
genes, ACRJ and /DP2, revealed that 
ACRJ, a gene essential for ACS J activity, 
also possessed a consensus CSRE motif, but 
interestingly, IDP2 did not. A search of the 
entire yeast genome sequence for the con- 
sensus CSRE motif revealed only four addi- 
tional candidate genes, none of which 
showed a similar induction. 

Examples from additional groups of 
genes that shared expression profiles are 
illustrated in Fig. 5, C through F. The 
sequences upstTeam of the named genes in 
Fig. 5C all contain stress response ele- 
ments (STRE), and with the exception 




Fig. 1. Yeast genome microarray. The actual size of the microarray is 18 mm by 18 mm The 
microarray was printed as described (9). This image was obtained with the same fluorescent 
scanning confocal microscope used to collect all the data we report (49). A fluorescently labeled 
cDNA probe was prepared from mRNA isolated from cells harvested shortly after inoculation (culture 
density of <5 x 10 6 cells/ml and media glucose level of 19 g/liter) by reverse transcription in the 
presence of Cy3-dUTP. Similarly, a second probe was prepared from mRNA isolated from cells taken 
from the same culture 9.5 hours later (culture density of ~2 x 10 B cells/ml, with a glucose level of 
<0.2 g/lrter) by reverse transcription in the presence of Cy5-dUTP. In this image, hybridization of the 
Cy3-dUTP-labeled cDNA (that is, mRNA expression at the initial timepoint) is represented as a green 
signal, and hybridization of Cy5-dUTP-labeled cDNA (that is t mRNA expression at 9.5 hours) is 
represented as a red signal. Thus, genes induced or repressed after the diauxic shift appear in this 
image as red and green spots, respectively. Genes expressed at roughly equal levels before and after 
the diauxic shift appear in this image as yellow spots. 
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of HSP42, have previously been shown to 
be controlled at least in pan by these 
elements (21-24). Inspection of the se- 
quences upstream of HSP42 and the two 
uncharacteriied genes shown in Fig. 5C t 
YKL026c, a hypothetical protein with 
similarity to glutathione peroxidase, and 
YGR043c t a putative transaldolase, re- 
vealed that each of these genes also pos- 
sess repeated upstream copies of the stress- 
responsive CCCCT motif. Of the 13 ad- 
ditional genes in the yeast genome that 
shared this expression profile (including 
HSP30, ALD2, OM45, and 10 uncharac- 
terized ORFs (25)], nine contained one or 
more recognizable STRE sites in their up- 
stream regions. 

The heterotrimeric transcriptional acti- 
vator complex HAP2,3A has been shown 
to be responsible for induction of several 
genes important for respiration (26-28). 
This complex binds a degenerate consensus 
sequence known as the CCA AT box (26). 
Computer analysis, using the consensus se- 
quence TNRYTGGB (29), has suggested 
that a large number of genes involved in 
respiration may be specific targets of 
HAP2,3 t 4 (30). Indeed, a putative 
HAP2,3,4 binding site could be found in 
the sequences upstream of each of the seven 
cytochrome c-related genes that showed 
the greatest magnitude of induction (Fig. 
5D). Of 12 additional cytochrome c-related 
genes that were induced, HAP2 % 3A binding 
sites were present in all but one. Signifi- 
cantly, we found that transcription of 
HAP4 itself was induced nearly ninefold 
concomitant with the diauxic shift. 

Control of ribosomal protein biogenesis 
is mainly exerted at the transcriptional 
level, through the presence of a common 
upstream-activating element (UAS ) 
that is recognized by the Rapl DNA-bina- 
ing protein (31, 32). The expression pro- 
files of seven ribosomal proteins are shown 
in Fig. 5F. A search of the sequences 
upstream of all seven genes revealed con- 
sensus Rapl -binding motifs (33). It has 
been suggested that declining Rapl levels 
in the cell during starvation may be re- 
sponsible for the decline in ribosomal pro- 
tein gene expression (34) . Indeed, we ob- 
served that the abundance of RAP J 
mRNA diminished by 4.4-fold, at about 
the time of glucose exhaustion. 

Of the 149 genes that encode known or 
putative transcription factors, only rwo, 
HAP4 and S1P4, were induced by a factor of 
more than threefold at the diauxic shift. 
SIP4 encodes a DNA-binding transcrip- 
tional activator that has been shown to 
interact with Snfl, the "master regulator" of 
glucose repression (35). The eightfold in- 
duction of SIP4 upon depletion of glucose 
strongly suggests a role in the induction of 



downstream genes at the diauxic shift. 

Although most of the transcriptional 
responses that we observed were not pre- 
viously known, the responses of many 
genes during the diauxic shift have been 
described. Comparison of the results we 
obtained by DNA microarray hybridiza- 
tion with previously reported results there- 
fore provided a strong test of the sensitiv- 
ity and accuracy of this approach. The 
expression patterns we observed for previ- 
ously characterized genes showed almost 
perfect concordance with previously pub- 
lished results (36). Moreover, the differ- 
ential expression measurements obtained 
by DNA microarray hybridization were re- 
producible in duplicate experiments. For 
example, the remarkable changes in gene 
expression between cells harvested imme- 
diately after inoculation and immediately 
after the diauxic shift (the first and sixth 
intervals in this time series) were mea- 
sured in duplicate, independent DNA mi- 
croarray hybridizations. The correlation 
coefficient for two complete sets of expres- 
sion ratio measurements was 0.87, and for 
more than 95% of the genes, the expres- 



sion ratios-measured in these duplicate 
experiments differed by less than a factor 
of 2. However, in a . few cases, there were 
discrepancies between our results and pre- 
vious results, pointing to technical limita- 
tions that will need to be addressed as 
DNA microarray technology advances 
(37, 38). Despite the noted exceptions, 
the high concordance between the results 
we obtained in these experiments and 
those of previous studies provides confi- 
dence in the reliability and thoroughness 
of the survey. 

The changes in gene expression during 
this diauxic shift are complex and involve 
integration of many kinds of information 
about the nutritional and metabolic state 
of the cell. The large number of genes 
whose expression is altered and the diver- 
sity of temporal expression profiles ob- 
served in this experiment highlight the 
challenge of understanding the underlying 
regulatory mechanisms. One approach to 
defining the contributions of individual 
regulatory genes to a complex program of 
this kind is to use DNA microarrays to 
identify genes whose expression is affected 



Fig. 2. The section of the ar- 
ray indicated by the gray box 
in Rg. 1 is shown for each of 
the experiments described 
here. Representative genes 
are labeled. In each of the ar- 
rays used to analyze gene 
expression during the diauxic 
shift, red spots represent 
genes that were induced rel- 
ative to the initial timepoint, 
and green spots represent 
genes that were repressed 
relative to the initial timepoint. 
In the arrays used to analyze 
the effects of the tup 7 A mu- 
tation and YAP1 overexpres- 
sion, red spots represent 
genes whose expression was 
increased, and green spots 
represent genes whose ex- 
pression was decreased by 
the genetic modification. Note 
that distinct sets of genes are 
induced and repressed in the 
different experiments. The 
complete images of each of 
these arrays can be viewed on 
the Internet (73). Cell density 
as measured by optical densi- 
ty (OD) at 600 nm was used to 
measure the growth of the 
culture. 
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"by muratibnTI In each putative regulatory* 
gene. As a test of this strategy, we analyzed 
the genomewide changes in gene expression 
that result from deletion of the TUP1 gene. 
Transcriptional repression of many genes by 
glucose requires the DN A -binding repressor 



Migl and is mediated by recruiting the tran- 
scriptional co-repressors Tupl and Cyc8/ 
Ssn6 (39). Tupl has also been implicated in 
repression of oxygen-regulated, mating-type— 
specific, and DNA-damage-inducible genes 
(40). 
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Fig. 3. Metabolic reprogramming inferred from global analysis of changes in gene expression. Only key 
metabolic intermediates are identified. The yeast genes encoding the enzymes that catalyze each step 
in this metabolic circuit are identified by name in the boxes. The genes encoding succinyi-CoA synthase 
and glycogen-debranching enzyme have not been explicitly identified, but the ORFs YGR244 and 
YPR184 show significant homology to known succinyl-CoA synthase and glycogen-debranching en- 
zymes, respectively, and are therefore included in the corresponding steps in this figure. Red boxes with 
white lettering identify genes whose expression increases in the diauxic shift. Green boxes with dark 
green lettering identify genes whose expression diminishes in the diauxic shift. The magnitude of 
induction or repression is indicated for these genes. For multimeric enzyme complexes, such as 
succinate dehydrogenase, the indicated fold-induction represents an unweighted average of all the 
genes listed in the box. Black and white boxes indicate no significant differential expression (less than 
twofold). The direction of the arrows connecting reversible enzymatic steps indicate the direction of the 
flow of metabolic intermediates, inferred from the gene expression partem, after the diauxic shift. Arrows 
representing steps catalyzed by genes whose expression was strongly induced are highlighted in red. 
The broad gray arrows represent major increases in the flow of metabolites after the diauxic shift, 
interred from the indicated changes in gene expression. 
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Wild-type yeast cells and cells bearing 
a deletion of the TUP1 gene (tupl A) were 
grown in parallel cultures in rich medium 
containing glucose as the carbon source. 
Messenger RNA was isolated from expo- 
nentially growing cells from the two pop- 
ulations and used to prepare cDNA la- 
beled with Cy3 (green) and Cy5 (red), 
respectively (J]). The labeled probes were 
mixed and simultaneously hybridized to 
the microanay. Red spots on the microar- 
ray therefore represented genes whose 
transcription was induced in the tuplb 
strain, and thus presumably repressed by 
Tupl (41). A representative section of the 
microarray (Fig. 2, bottom middle panel) 
illustrates that the genes whose expression 
was affected by the tupJA mutation, were, 
in general, distinct from those induced 
upon glucose exhaustion [complete images 
of all the arrays shown in Fig. 2 are avail- 
able on the Internet (13)]. Nevertheless, 
34 (10%) of the genes that were induced 
by a factor of at least 2 after the diauxic 
shift were similarly induced by deletion of 
TUPl, suggesting that these genes may be 
subject to TUPJ-mediated repression by 
glucose. For example, SUC2 t the gene en- 
coding invertase, and all five hexose trans- 
porter genes that were induced during the 
course of the diauxic shift were similarly 
induced, in duplicate experiments, by the 
deletion oiTUPl. 

The set of genes affected by Tupl in this 
experiment also included a-glucosidases, 
the mating-type-specific genes MFAJ and 
MFA2, and the DNA damage-inducible 
RNR2 and RNR4, as well as genes involved 
in flocculation and many genes of unknown 
function. The hybridization signal corre- 
sponding to expression of TUPl itself was 
also severely reduced because of the (in- 
complete) deletion of the transcription unit 
in the rupJA strain, providing a positive 
control in the experiment (42). 

Many of the transcriptional targets of 
Tupl fell into sets of genes with related 
biochemical functions. For instance, al- 
though only about 3% of all yeast genes 
appeared to be TUP J -repressed by a factor 
of more than 2 in duplicate experiments 
under these conditions, 6 of the 13 genes 
that have been implicated in flocculation 
(15) showed a reproducible increase in 
expression of at least twofold when TUPJ 
was deleted. Another group of related 
genes that appeared to be subject to TUPl 
repression encodes the serine-rich cell 
wall mannoproteins, such as Tipl and 
Tirl/Srpl which are induced by cold 
shock and other stresses (43), and similar, 
serine-poor proteins, the seripauperins 
(44). Messenger RNA levels for 23 of the 
26 genes in this group were reproducibly 
elevated by at least 2.5-fold in the rupJA 
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strain, and 18 of these genes were induced 
by more than sevenfold when TUPl was 
deleted. In contrast, none of 83 genes that 
could be classified as putative regulators of 
the cell division cycle were induced more 
than twofold by deletion of TUPl. Thus, 
despite the diversity of the regulatory sys- 
tems that employ Tupl, most of the genes 
that it regulates under these conditions 
fall into a limited number of distinct func- 
tional classes. 

Because the microarray allows us to 
monitor expression of nearly every gene in 
yeast, we can, in principle, use this ap- 
proach to identify all the transcriptional 
targets of a regulatory protein like Tupl. It 
is important to note, however, that in any 
single experiment of this kind we can only 
recognize those target genes that are nor- 
mally repressed (or induced) under the 
conditions of the experiment. For in- 
stance, the experiment described here an- 
alyzed a MAT a strain in which MFA1 
and MFA2, the genes encoding the a- 
factor mating pheromone precursor, are 
normally repressed. In the isogenic tup 1 A 
strain, these genes were inappropriately 
expressed, reflecting the role that Tupl 
plays in their repression. Had we instead 
carried out this experiment with a MATA 
strain (in which expression of MFAJ and 
MFA2 is not repressed), it would not have 
been possible to conclude anything re- 
garding the role of Tupl in the repression 
of these genes. Conversely, we cannot dis- 
tinguish indirect effects of the chronic 
absence of Tupl in the mutant strain from 
effects directly attributable to its partici- 
pation in repressing the transcription of a 
gene. 

Another simple route to modulating the 
activity of a regulatory factor is to overex- 
press the gene that encodes it. YAP} en- 
codes a DN A -binding transcription factor 
belonging to the b-zip class of DNA-bind- 
ing proteins. Overexpression of YAPl in 
yeast confers increased resistance to hydro- 
gen peroxide, o-phenanthroline, heavy 
metals, and osmotic stress (45). We ana- 
lyzed differential gene expression between a 
wild- type strain bearing a control plasm id 
and a strain with a plasmid expressing YAPl 
under the control of the strong GALl-10 
promoter, both grown in galactose (that is, 
a condition that induces YAPl overexpres- 
sion). Complementary DNA from the con- 
trol and YAPJ overexpressing strains, la- 
beled with Cy3 and Cy5, respectively, was 
prepared from mRNA isolated from the two 
strains and hybridized to the microarray. 
Thus, red spots on the an*ay represent genes 
that were induced in the strain overexpress- 
ing YAPJ. 

Of the 17 genes whose mRNA levels 
increased by more than threefold when 



YAPJ was overexpressed in this way, five 
bear homology to aryl-alcohol oxidoreduc- 
tases (Fig. 2 and Table 1). An additional 
four of the genes in this set also belong to 
the general class of dehydrogenases/oxi- 
doreductases. Very little is known about 
the role of aryl-alcohol oxidoreductases in 
S. cerevisiae, but these enzymes have been 
isolated from ligninolytic fungi, in which 
they participate in coupled redox reac- 
tions, oxidizing aromatic, and aliphatic 
unsaturated alcohols to aldehydes with the 
production of hydrogen peroxide (46, 47). 
The fact that a remarkable fraction of the 
targets identified in this experiment be- 
long to the same small, functional group of 
oxidoreductases suggests that these genes 



Pig. 4. Coordinated, reg- 
ulation of functionally re- 
lated genes. The curves 
represent the average in- 
duction or repression ra- 
tios for all the genes in 
each indicated group. 
The total number of 
genes in each group was 
as follows: ribosomaJ 
proteins, 11 2; translation 
elongation and initiation 
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might play an important protective role 
during oxidative stress. Transcription of a 
small number f genes was reduced in the 
strain oyerexpressing Yapl. Interestingly, 
many of these genes encode sugar per- 
meases or enzymes involved in inositol 
metabolism. 

,-r-^%4 Carched for Y apl-binding sites 
(TTACTAA or TGACTAA) in ihe se- 
quences upstream of the target genes we 
identified (48). About two-thirds of the 

g if nCS r £ 3t Were induced ^ *«« than 
threefold upon Yapl overexpression had 
one or more binding sites within 600 bases 
upstream of the start codon (Table 1), sug- 
gestingthat they are directly regulated by 
Yapl. The absence of canonical Yapl-bind- 
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thg sites upstream oT the others may reflect 
an ability of Yapl to bind sites that differ 
from the canonical binding sites, perhaps in 
cooperation with other factors, or less like- 
ly, may represent an indirect effect of Yapl 
overexpression, mediated by one or more 
intermediary factors. Yapl sites were found 
only four times in the corresponding region 
of an arbitrary set of 30 genes that were not 
differentially regulated by Yapl. 

Use of a DNA microarray to character- 
ize the transcriptional consequences of 
mutations affecting the activity of regula- 
tory molecules provides a simple and pow- 
erful approach to dissection and character- 
ization of regulatory pathways and net- 
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works. This strategy also has an important 
practical application in drug screening. 
Mutations in specific genes encoding can- 
didate drug targets can serve as sunogates 
for the ideal chemical inhibitor or modu- 
lator of their activity. DNA microarrays 
can be used to define the resulting signa- 
ture pattern of alterations in gene expres- 
sion, and then subsequently used in an 
assay to screen for compounds that repro- 
duce the desired signature pattern. 

DNA microarrays provide a simple and 
economical way to explore gene expres- 
sion patterns on a genomic scale. The 
hurdles to extending this approach to any 
other organism are minor. The equipment 
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Fig. 5. Distinct temporal patterns of induction or repression help to group genes that share requtatorv 
properties. (A) Temporal profile of the cell density, as measured by OD at 600 nm and qlucose 
concentration in the media. (B) Seven genes exhibited a strong induction (greater than ninefold) only at 
the last timepoint (20.5 hours). With the exception of IDP2, each-of these genes has a CSRE UAS There 
were no additional genes observed to match this profile. (C) Seven members of a class of genes marked 
by earty induction with a peak in mRNA levels at 18.5 hours. Each of these genes contain STRE motif 
repeats in their upstream promoter regions. (D) Cytochrome c oxidase and ubiquinol cytochrome c 
reductase genes. Marked by an induction coincident with the diauxic shift, each of these genes contains 
a consensus binding motif for the HAP2.3.4 protein complex. At least 17 genes shared a similar 
expression profile. (E) SAMh GPP1, and several genes of unknown function are repressed before the 
diauxic shift, and continue to be repressed upon entry into stationary phase. (F) Ribosomal protein 
genes compose a large class of genes that are repressed upon depletion of glucose. Each of the genes 
profiled here contains one or more RAP1 finding motifs upstream of its promoter. RAPl is a transcrio- 
tbnal regulator of most ribosomal proteins. 



requiredjpr fabricating and using DNA 
microarrays (9) consists of components 
that were chosen for their modest cost and 
simplicity. It was feasible for a small group 
to accomplish the amplification of more 
than 6000 genes in about 4 months and, 
once the amplified gene sequences were in 
hand, only 2 days were required to print a 
set of 110 microarrays of 6400 elements 
each. Probe preparation, hybridization, 
and fluorescent imaging are also simple 
procedures. Even conceptually simple ex- 
periments, as we described here, can yield 
vast amounts of information. The value of 
the information from each experiment of 
this kind will progressively increase as 
more is learned about the functions of 
each gene and as additional experiments 
define the global changes in gene expres- 
sion in diverse other natural processes and 
genetic perturbations. Perhaps the greatest 
challenge now is to develop efficient 
methods for organizing, distributing, inter- 
preting, and extracting insights from the 
large volumes of data these experiments 
will provide. 
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We describe here a method for drug target validation and identification of secondary drug tar- 
get effects based on genome-wide gene expression patterns. The method is demonstrated by 
several experiments, including treatment of yeast mutant strains defective in calcineurin, im- 
munophilins or other genes with the immunosuppressants cyclosporin A or FK506. Presence or 
absence of the characteristic drug 'signature' pattern of altered gene expression in drug-treated 
cells with a mutation in the gene encoding a putative target established whether that target was 
required to generate the drug signature. Drug dependent effects were seen in 'targetless' cells, 
showing that FK506 affects additional pathways independent of catcineurin and the im- 
munophilins. The described method permits the direct confirmation of drug targets and recog- 
nition of drug-dependent changes in gene expression that are modulated through pathways 
distinct from the drug's intended target. Such a method may prove useful in improving the effi- 
ciency of drug development programs. 



Good drugs are potent and specific; that is. they must have 
strong effects on a specific biological pathway and minimal ef- 
fects on all other pathways. Confirmation that a compound in- 
hibits the intended target (drug target validation} and the 
identification of undesirable secondary effects are among the 
main challenges in developing new drugs. Comprehensive 
methods that enable researchers to determine which genes or 
activities are affected by a given drug might improve the effi- 
ciency of the drug discovery process by quickly identifying po- 
tential protein targets, or by accelerating the identification of 
compounds likely to be toxic. DNA microarray technology, 
which permits simultaneous measurement of the expression 
levels of thousands of genes, provides a comprehensive frame- 
work to determine how a compound affects cellular metabolism 
and regulation on a genomic scale 1 " 11 . DNA microarrays that 
contain essentially every open reading frame (ORF) in the 
Saccharomyces cerevisiae genome have already been used success- 
fully to explore the changes in gene expression that accompany 
large changes in cellular metabolism or cell cycle progression 710 . 

In the modern drug discovery paradigm, which typically be- 
gins with the selection of a single molecular target, the ideal in- 
hibitory drug is one that inhibits a single gene product so 
completely and so specifically that it is as if the gene product 
were absent. Treating cells with such a drug should induce 
changes in gene expression very similar to those resulting from 
deleting the gene encoding the drug's target. Here we have com- 
pared the genome-wide effects on gene expression that result 
from deletions of various genes in the budding yeast 5. cerevisiae 
to the effects on gene expression that result from treatment 



with known inhibitors of those gene products. Using the cal- 
cineurin signaling pathway as a model system, we tested an ap- 
proach that permits identification of genes that encode proteins 
specifically involved in pathways affected by a drug. The FK506 
characteristic pattern, or signature', of altered gene expression 
was not observed in mutant cells lacking proteins inhibited by 
FK506 (for example, a calcineurin or FK506-bJnding-protein 
mutant strain), but was observed in mutants deleted for genes 
in pathways unrelated to FK506 action (for example, a cy- 
clophilin mutant strain). Conversely, the cyclosporin A (CsA) 
signature was not observed in CsA-treated calcineurin or cy- 
clophilin mutant strains, but was seen in an FK506-binding-pro- 
tein mutant strain treated with CsA. The method also 
demonstrates that FK506. a clinically used immunosuppressant, 
has 'off-target* effects that are independent of its binding to im- 
munophilins. Thus, the approach we describe may provide a 
way to identify the pathways altered by a drug and to detect 
drug effects mediated through unintended targets. 

Null mutants phenocopy drug-treated cells on a genomic scale 
To test whether a null mutation in a drug target serves as a 
model of an ideal inhibitory drug, we examined the effects on 
gene expression associated with pharmacological or genetic in- 
hibition of calcineurin function. Calcineurin is a highly con- 
served calcium- and calmodulin-activated serine/threonine 
protein phosphatase implicated in diverse processes dependent 
on calcium signaling 12 ' 13 . In budding yeast, calcineurin is re- 
quired for intracellular ion homeostasis' 4 , for adaptation to pro- 
longed mating pheromone treatment 15 and in the regulation of 
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Fig. i~ Model of antagonism of the calcineurin signaling pathway mediated 
by FK506 and cyclosporin A (CsA). Calcineurin activity is composed of a cat- 
alytic subunit (cateineurin A encoded in yeast by the CNA 7 and CNA2 genes), 
and calcium-binding regulatory subunits calmodulin (CMD) and calcineurin B 
(CnB). After entering cells, FK506 and CsA specifically bind and inhibit the 
peptidyJ-proline isomerase activity of their respective immunophilins, FK506 
binding proteins (FKBP) and cyclophilins (CyP). The most abundant im- 
munophilins in yeast (Fpr1 and Cph1) are thought to mediate calcineurin in- 
hibition. Drug-immunophilin complexes bind and inhibit the calcium- and 
calmoduJin-stimulated phosphatase calcineurin. Among the substrates of cal- 
cineurin are uansatptional activators that act to modulate gene expression. 



the onset of mitosis 16 . In mammals, calcineurin has been impli- 
cated in T-cell activation i2 . in apoptosis 17 . in cardiac hypertro- 
phy 1 * and in the transition from short-term to long-term 
memory 19 . In both organisms, calcineurin activity is inhibited 
by FK506 and CsA, immunosuppressant drugs whose effects on 
calcineurin are mediated through families of intracellular recep- 
tor proteins called immunophilins 1 " 0 (Fig. 1). To assess the ef- 
fects of pharmacologic inhibition of calcineurin. wild -type 5. 
cerevisiae was grown to early logarithmic phase in the presence 
| or absence of FK506 or CsA. Isogenic cells, from which the 
© genes encoding the catalytic subunits of calcineurin (CNA1 and 
5 CNA2) had been deleted*' (referred to as the cna or calcineurin 
§ mutant), were grown in parallel, in the absence of the drug. 
-§ Fluorescently- labeled cDNA was prepared by reverse transcrip- 
tion of polyA* RNA in the presence of Cy3- or Cy5-deoxynu- 
J cleotide triphosphates and then hybridized to a microarray 
Jj- containing more than 6,000 DNA probes representing 97% of 
f the known or predicted ORFs in the yeast genome, 
o Simultaneous hybridization of Cy5-labeled cDNA from mock- 
„ treated cells and Cy3-Iabeled cDNA from cells treated with 1 
•c jig/ml FK506 allowed the effect of drug treatment on mRNA lev- 
E els of each ORF to be determined (Fig. 2a and b and data not 
qj shown). Similarly, effects of the calcineurin mutations on the 
2 mRNA levels of each gene were assessed by simultaneous hy- 
^ bridization of CyS-labeled cDNA from wild-type cells and Cy3- 
g labeled cDNA from the calcineurin mutant strain (Fig. 2c). For 
T" each comparison of this kind, reported expression ratios are the 
K average of at least two hybridizations in which the Cy3 and Cy5 
fluors were reversed to remove biases that may be introduced by 
gene-specific differences in incorporation of the two fluors 
(data not shown). 

Treatment with FK506 in these growth conditions resulted in 
a signature pattern of altered gene expression in which mRNA 
levels of 36 ORFs changed by more than twofold 
(http://www.rosetta.org). A very similar pattern of altered gene 
expression was observed when the calcineurin mutant strain 
was compared to wild-type cells. Comparison of the changes in 
mRNA expression of each gene resulting from treatment of 
wild-type cells with FK506 with mRNA expression changes re- 
sulting from deletion of the calcineurin genes showed the con- 
siderable similarity of the global transcript alterations in 
response to the two perturbations (Fig. 2o-d). Quantification of 
this similarity using the correlation coefficient (p) showed 
large correlations between the FK506 treatment signature and 
the calcineurin deletion signature (p « 0.75 * 0.03), as well as 
the CsA treatment signature (p = 0.94*0.02). but not with a 
randomly selected deletion mutant strain (deleted for the 
YER071Cgene; p - -0.07 ± 0.04; Fig. 2e). The FK506 treatment 
signature was also compared with those of more than 40 other 
deletion mutant strains or drug-treatments thought to affect 
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unrelated pathways, and none had statistically significant cor- 
relations. These data establish that genetic disruption of cal- 
cineurin function provides a close and specific phenocopy of 
treatment with FK506 or CsA. 

To avoid generalizing from a single example, we also com- 
pared the effects of treatment of wild-type cells with 3-aminotri- 
azole (3-AT) with the effects of deletion of the H1S3 gene. HJS3 
encodes imidazoleglycerol phosphate dehydratase, which cat- 
alyzes the seventh step of the histidine biosynthetic pathway in 
yeast 22 ; 3-AT is a competitive inhibitor of this enzyme that trig- 
gers a large transcriptional amino-acid starvation response". 
Microarray analysis of wild-type and isogenic /)ls3-deficient 
strains demonstrated the expected large genome-wide transcrip- 
tional responses (involving more than 1,000 ORFs) resulting 
from treatment with 3-AT (Fig. 3a) or from HIS3 deletion (Fig. 
3c). Quantitative comparison of the 3-AT treatment signature 
and the his3 mutant signature showed a high level of correlation 
(p= 0.76 ± 0.02) that even extended to genes that experienced 
small changes in expression level (Fig. 3d). As a negative control, 
the correlations between the 3-AT treatment signature or the 
his3 mutant signature and the calcineurin mutant strain were 
not statistically significant (p «= 0.09 ± 0.06 and -0.01 ± 0.04. re- 
spectively). That both the calcineurin/FK506 and the Ms3/3-AT 
comparisons were highly correlated indicates that in many cases 
the expression profile resulting from a gene deletion closely re- 
sembles the expression profile of wild-type cells treated with an 
inhibitor of that gene s product. 

'Decoder' strategy: Drug target validation with deletion mutants 
Because pharmacological inhibition of different targets might 
give similar or identical expression profiles, simple comparison 
of drug signatures to mutant signatures is unlikely to unambigu- 
ously identify a drug s target. To overcome this limitation, an 
additional decoder' step is used. We first compare the expres- 
sion profile of wild-type drug-treated cells to the expression pro- 
files from a panel of genetic mutant strains, using a correlation 
coefficient metric. Mutant strains whose expression profile is 
similar to that of drug-treated wild-type cells are selected and 
subjected to drug treatment, generating the drug signature in 
the mutant strain (that is, the mutant drug signature). If the 
mutated gene encodes a protein involved in a pathway affected 
by the drug, we expect the drug signature in mutant cells to be 
different (or absent, for an ideal drug) from the drug signature 
seen in wild-type cells. 
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Fig. 2 Expression profiles from 
FK506-treeted wild-type (wl) 
cells and a calcineurin-disruption 
mutant strain share a genome- 
wide correlation. DNA microarray 
analysis showing changes in gene 
expression resulting from FK506 
treatment (a and b) or from ge- 
netic disruption of genes encod- 
ing calcineurin (c). J, Pseudo- 
color image of the results of si- 
multaneous hybridization of Cy5- 
labeled cDNA (red) from 
mock-treated strain R563 and Cy3-labeled cDNA 
(green) from strain R563 treated with 1 ug/ml FK506. 
b, Enlarged view of the boxed area in a. Arrowheads in- 
dicate specific ORFs induced or repressed, t, Pseudo- 
color image of the results of simultaneous hybridization 
of Cy5-labeled cDNA (red) from strain R563 and Cy3- 
labeled cDNA (green) from strain MCY300 (deleted for 
the CNA1.CNA2 catalytic subunits of calcineurin). 
Arrows indicate specific ORFs induced or repressed, d, 
The log 10 of the expression ratio lor each ORF derived 
from the FK506 treatment hybridizations is plotted ver- 
sus the logic of the expression ratio in the calcineurin 
mutant hybridizations. ORFs that were induced or re- 
pressed in both experiments are shown as green and 
red dots, respectively. «. The log™ of the expression ratio for each ORF de- 
rived from the FK506 treatment hybridizations is plotted versus the log,, 
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wt vs. cateinenjrtn mutant 




o 

- 1 



logic (R/G) calcineurin mutation 



Log w (R/G) yerV71c mutation 



of the expression ratio in the yer071c mutant hybridizations. No ORFs 
were induced or repressed in both experiments. 



To illustrate this, we treated the his3 mutant strain with 3- 
AT. The signature pattern of altered gene expression resulting 
from treatment of the mutant strain with 3-AT was much less 
complex than that of the 3-AT signature in wild-type cells (Fig. 
4). This is seen simply by examining plots of mean intensity of 
the hybridization signal (which approximately reflects level of 
expression) versus the expression ratio for each ORF (Fig. 4). 
Genes that were expressed at higher or lower levels in 3-AT 
treated cells or in hJs3 mutant cells are shown as red and green 
dots, respectively. We analyzed the 3-AT signature in wild-type 
(Fig. 4a) and his3 mutant cells (Fig. 4c). as well as the his3 mu- 
tant strain signature (Fig. 4b). Whereas histidine limitation in- 
duced by 3-AT induced more than 1.000 transcription-level 
changes in the wild-type strain, few or no transcript level 
changes were induced by treatment of the /j/s3-deletion strain 
with 3-AT. This indicates that with the growth conditions used, 
essentially all of the effects of 3-AT depend on or are mediated 
through the HIS3 gene product. 

Applying this approach to the calcineurin signaling pathway 
showed the specificity of the method. The calcineurin mutant 
strain and strains with deletions in the genes encoding the 
most abundant immunophilins in yeast 12 {CP Hi and FPR1) 
were treated with either FK506 or CsA to determine the profiles 



Table 1 Signature correlation of expression ratios as a result of FK506 
treatment in various mutant strains 





wild-type 


cna 


fprl 


cna fprl 


cpn7 




+/-FK506 


4/-FK506 


4/-FK506 


W-FK506 


+/-FK506 


wild-type 












♦ /- FK506 


0.93 i 0.04 


-0.01 i 0.07 


-0.23 * 0.07 


0.12 t 0.07 


0.79 ± 0.03 



Signature correlation shows the absence of the FK506 signature specifically in the calcineurin (cna) and fprl 
(major FK506 binding protein) deletion mutants, cna represents the mutant with deletions of the catalytic sub- 
units of calcineurin. CAM 1 and CNA2. The correlation coefficient reported in the first column represents the cor- 
relation between two pairs of hybridizations from independent wild-type W- FK506 experiments. 



of altered gene expression resulting from drug treatment of the 
mutant cells (that is, mutant +/- drug). We compared the drug 
signatures in the mutants to the wild-type drug signature using 
the correlation coefficient metric (Table 1). Although the signa- 
ture generated by treatment of wild-type cells with FK506 was 
highly correlated to the calcineurin mutant strain signature (p 
« 0.75 ± 0.03). it bore no similarity to the profile after treat- 
ment of the calcineurin mutant strain with FK506 (p * -0.01 ± 
0.07). This indicates that FK506 was unable to elicit its normal 
transcriptional response in the calcineurin mutant strain. 
Likewise, treatment of the fprl mutant strain with FK506 
elicited an expression profile that was not correlated to the 
FK506 signature in the wild-type strain (p « -0.23 ± 0.07). indi- 
cating that the FPRi gene product is likely to be involved in the 
pathway affected by FK506. The same was true for the cna fprl 
mutant strain. In contrast, treatment of the cphl mutant strain 
with FK506 generated an expression profile highly correlated 
with the wild-type FK506 expression profile (p « 0.79 ± 0.03), 
indicating the cphl mutation did not block the mode of action 
of FK506 and thus is not directly involved in the pathway af- 
fected by FK506. We tabulated the change in expression in re- 
sponse to FK506 in different mutant strains for all ORFs with 
expression ratios greater than 1.8 in FK506-treated cells or in 
the calcineurin mutant strain (Fig. 5a).The 
calcineurin mutant strain signature and the 
FK506 responses in wild-type and the cphl 
mutant strain are similar, and there are no 
transcript-level changes (seen in black) for 
treatment of the calcineurin, fprl and cna 
fprl mutant strains with FK506 (Fig. 5a). 

Similar experiments and analyses with CsA 
provided further validation of this approach. 
The expression profile elicited by treatment 
of wild-type cells with CsA was highly corre- 
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Fig. 3 Expression profiles 
from a A/sJ mutant strain 
and wild-type (wi) cells 
ueated with 3-AT share a 
genome-wide correlation. 
DNA microarray analysis 
showing changes in gene 
expression resulting from 3- 
AT treatment (a) or from ge- 
netic disruption of the HIS 3 
gene (c) . a. Pseudo-color 
image of the results of simul- 
taneous hybridization of 

Cy5*labeled cDNA (red) from mock-treated wild-type strain R491 and 
Cy3-labeled cDNA (green) from strain R491 treated with 10 mM 3-AT. 
b, Plot of the log« of the expression ratio for each ORF derived from the 
3-AT treatment hybridizations is plotted versus the log 10 of the expression 
ratio in the hi$3 mutant hybridizations. ORFs that were induced or re- 
pressed in both experiments are shown as green and red dots, respec- 
tively. The correlation of expression ratios applies not only to genes with 
large expression ratios (for example, CHA1 and A8G1), but also extends to 
genes with expression ratios less than 2 (for example. ILV1 and CPH1). 
ILV1 is induced 1.9-fold and 1.5-fold, and CPH1 is downregulated 1.9-fold 
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and 1 .7-fold, in cells treated with 3-AT and his3 mutant cells, respectively. 
Two ORFs do not fall on the line x « y. The leftmost point is the HIS3 data 
point, which is induced by 3-AT treatment but which is not absent from 
the his3 mutant strain. The other point is YOR203w. Both data points are 
labeled HIS3 because hybridization to YOR203w is most likely due to HIS3 
mRNA, as YOR203w overlaps the HIS3 open reading frame. *. Pseudo- 
color image of the results of simultaneous hybridization of Cy5-labeled 
cDNA (red) from wild-type strain R491 and Cy3-labeled cDNA (green) 
from strain R1226, deleted for the HIS3 gene. Arrowheads indicate spe- 
cific ORFs induced or repressed. 
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lated to the profile elicited by mutation of the calcineurin genes 
(p « 0.71 ± 0.04). but did not correlate with the expression pro- 
file resulting from treatment of the calcineurin mutant strain 
with CsA (p = -0.05 ± 0.07; Table 2). indicating that the genetic 
deletion of calcineurin interfered with the ability of CsA to 
elicit its normal transcriptional response. Likewise, the CsA sig- 
nature was essentially absent in CsA-treated cphJ mutant cells, 
and the expression profile of CsA-treated cphl mutant cells cor- 
related poorly to that of CsA-treated wild-type cells (p = 0.18 ± 
0.07). Thus, the CPHl gene product was required for the CsA re- 
sponse seen in wild-type cells. Conversely, treatment of fpr] 
mutant cells with CsA resulted in an expression pattern very 
similar to the profile of CsA-treated wild-type cells (p «= 0.77 ± 
0.03). indicating that FPR] was not necessary for the CsA-medi- 
ated effects. Analysis of individual ORFs affected by CsA and 
their expression ratios over the entire set of experiments con- 
firmed that CPHJ and the genes encoding calcineurin. but not 



FPR], are necessary for the wild-type CsA response (Fig. 5o). The 
observation that the profiles resulting from FK506 or CsA drug 
treatment are similar to that of the calcineurin deletion mutant 
strain might allow the prediction that calcineurin was involved 
in the pathway affected by these drugs. But because the expres- 
sion profile of the fpr] mutant strain did not bear a strong simi- 
larity to the wild-type drug expression profile for FK506. it is 
obvious that the drug treatment of the mutant strains was nec- 
essary to identify Fprl . but not Cphl . as a potential FK506 drug 
target. In the same way. the decoder' strategy was necessary to 
identify Cphl. but not Fprl. as a potential drug target for CsA. 

'Decoder' approach can identify secondary drug effects 
For a drug that has a single biochemical target, the strategy out- 
lined above may be useful in target validation. In many cases, 
however, a compound may affect multiple pathways and elicit 
a very complex signature. Decoding' such a complex signature 
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Fig. 4 Treatment of the his3 mutant strain with 3-AT shows nearly com- 
plete loss of 3-AT signature. A plot of the log, 0 of the mean intensity of hy- 
bridization for each ORF versus the log 10 of its expression ratio for each 
experiment is shown next to a pseudo-color image of a representative 
portion of the microarray. ORFs that are induced or repressed at the 95% 
confidence level are shown in green and red, respectively, a. Expression 
profile from treatment of the wild-type (wt) strain with 3-AT. Cy5- la beted 
cDNA (red) from mock-treated strain R491 and Cy3-labeled cDNA 
(green) from strain R491 treated with 10 mM 3-AT. b. Expression profile 



from the his3 deletion strain. CyS-labeled cDNA (red) from strain R491 
and Cy3-labeled cDNA (green) from strain R1226. deleted for the HIS3 
gene. *. Expression profile of treatment of the his3 deletion strain with 3- 
AT. Cy3-labeled cDNA (red) from h/sJ-deleted strain R1226 and CyS-la- 
beled cDNA (green) from strain R1226 treated with 10 mM 3-AT. 
Arrowheads indicate the DNA probe and data point corresponding to the 
HIS3 gene. The blue dashed line represents the threshold below which er- 
rors tend to increase rapidly because spot intensities are not sufficiently 
above background intensity. 
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Signature torreratioh or expression ratios as a result of CsA 
treatment in various mutant strains 



wild-type 
♦/-CsA 



cna 
♦/-CsA 



wild-type 
4/- CsA 



fprl 
♦/-CsA 



cna cphl 
♦/-CsA 



cphl 
♦/-CsA 



0.94*0.04 -0,051,07 0.77*0.03 -Q.n i 0.07 0.18 1 0.07 



Strain: 



FK506 



Signature correiaton shows (he absence of the CsA signature specifically in the calcineurin (cna) and cphl 
(cyclophilin) deletion mutants, cna represents the mutant with deletions of the catalytic subunits ofcal. 
cirwin. CM! and CNA2. The correction coefficient reported in the first column represent, Z t"n 
between two pairs of hybridizations from independent witd-type CsA experiments correl «'°n 



into the effects mediated through the intended target (the on- 
target signature') and those mediated through unintended tar- 
gets (the off-target' signature) might be useful in evaluating a 
compound's specificity. Our decoder' strategy is based on the 
premise that off-target' signature should be insensitive to the 
genetic disruption of the primary target. 

To determine whether the decoder' approach could identify 
an off-target* profile, we looked for a drug-responsive gene 
whose expression is insensitive to deletion of the primary tar- 
get. To increase the likelihood of observing such genes, the 
same strains described in Tables I and 2 were treated with 
higher concentrations (50 ug/ml) of FK506. This led to a much 
more complex expression profile in wild-type cells, indicating 
that at this higher concentration. FK506 was inhibiting or acti- 
vating additional targets. Several of the .ORFs in this expanded 
FK506-induced expression profile were not affected by the cal- 
cineurin. cphl or fprl mutations, as drug treatment of these mu- 
tant strains did not block their presence in the FK506 
expression signature (Fig. 6). This indicates that FK506 was trig- 
gering changes in transcript levels of many genes through path- 
ways independent of calcineurin. CPHl and FPRL Many of the 
upregulated ORFs in the off-target' pathway were genes re- 
ported to be regulated by the transcriptional activator Ccn4 
(ref. 24). In some strains, a reporter gene under GCN4 control 
was induced in response to FK506 treatment". To determine 
whether GCN4 is involved in this pathway that is independent 
of calcineurin, CPHl and FPRL we analyzed the effects of treat- 
ment with high-dose FK506 on global gene expression in a 
strain with a CCN4 deletion (Fig. 6). Of the 41 ORFs with cal- 
cineurin-independent expression ratios greater than 4, 32 were 
not induced in the gcn4 mutant, indicating that their induction 
by FK506 was CCAtt-dependent. Not all GCN4- regulated genes 
were induced by FK506. This FK506-induced subset of GCM4- 
regulated genes may be those most sensitive to subtle changes 
in Gcn4 levels, or perhaps other regulatory circuits prevent 
FK506 activation of some COW-regulated genes. Seven of the 
remaining nine ORFs induced by FK506 were independent of 

Fig. 5 Response of FK506 and CsA signature genes in strains with deletions 
in different genes. Genes with expression ratios greater than a factor of 1 .8 in 
response to treatment with 1 ug/ml FK506 (a) or 50 ug/ml CsA (fc) are listed 
(left side) and their expression ratios in the indicated strain are shown on the 
green (induction)-red (repression) color scale, a, Calcineurin (cna) mutant 
and FK506 ueatment signature genes are in the first two columns. Almost all 
FK506 signature genes have expression ratios near unity in deletion strains 
involved in pathways affected by FK506 (calcineurin. fprl and cna fprl mu- 
tants) but not in deletion strains in unrelated pathways {cphl). b. Calcineurin 
{cna) mutant and CsA treatment signature genes are in the first two 
columns. Almost all CsA signature genes have expression ratios near unity in 
deletion strains involved in pathways affected by CsA (calcineurin. cphl and 
cna cphl mutants) but not in deletion strains in unrelated pathways [fprl). 



""both the calcineurin and CCN4 pathways. The 
simplest explanation is that FK506 inhibits r 
activates additional pathways. Members of this 
class include SNQ2 and PDR5> genes that en- 
code drug efflux pumps with structural h mol- 
ogy to mammalian multiple drug resistance 
proteins". FK506 may interact directly with 
Pdr5 to inhibit its function". Our results indi- 
cate that treatment with FK506 leads to four- 

fold-to-sixfold induction of PDRSmTWA levels. 

YORL another gene that can confer drug resis- 
tance, is also induced threefold -to-fourfold by 
FK506. Thus, drug treatment of strains with mutations in the 
primary targets can prove useful in identifying effects mediated 
by secondary drug targets, including the nature and extent of 
newly discovered and previously unsuspected pathways af- 
fected by the drug. 

We describe here a method for drug target validation and the 
identification of secondary drug target effects that uses DNA mi- 
croarrays to survey the effects of drugs on global gene expres- 
sion patterns. We established that genetic and pharmacologic 
inhibition of gene function can result in extremely similar 
changes in gene expression. We also demonstrated that one can 
confirm a potential drug target by treating a deletion mutant 
defective in the gene encoding the putative target. Drug-medi- 
ated signatures from strains with mutations in pathways or 
processes directly or indirectly affected by the drug bore little or 
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no similarity to the wild-type drug expression profile. In con- 
trast, drug-mediated signatures from strains with mutations in 
genes involved in pathways unrelated to the drugs action 
showed extensive similarity to the wild-type drug signature. By 
applying this approach to a drug that affects multiple pathways 
(FK506), we were able to decode a complex signature into com- 
ponent parts, including the identification of an off-target' sig- 
nature that was mediated through pathways independent of 
calcineurin or the Fprl immunophilin. 

Discussion 

It is well-established that high-throughput biochemical screen- 
ing can identify potent inhibitory compounds against a given 
target. The 'decoder* approach described here complements 
this process by evaluating the equally important property of 
specificity: the tendency of a compound to inhibit pathways 
other than that of its intended target. The ability to observe 
such 'ofT-target* effects will likely be useful in several ways. 
Profiling compounds with known toxicities will allow the de- 
velopment of a database of expression changes associated with 
particular toxicities. Recognition of potential toxicities in the 
off-target' signatures of otherwise promising compounds then 
may allow earlier identification of those likely to fail in clinical 
trials. Comparing the extent and peculiarities of off-target' sig- 
natures of promising drug candiates could provide a new way 
to group compounds by their effects on secondary pathways, 
even before those effects are understood. This may prove to be 
an alternative, potentially more effective, way to select com- 
pounds for animal and clinical trials. Some drugs are more ef- 
fective against a related protein than against the originally 
intended target. Sildenafil (Viagra ™). for example, was initially 
developed as a phosphodiesterase inhibitor to control cardiac 
contractility, but was found to be highly specific for phospho- 
diesterase 5, an isozyme whose inhibition overcomes defects in 

1298 



Fig. 6 Response of FK506 signature genes in strains with deletions 
.n different genes. Genes with expression ratios greater than a factor 
of 4 in at least one experiment are listed end their expression ratios in 
the indicated strain are shown in the green (inductionHed (repres- 
sion) color scale. The genes have been divided into classes corre- 
sponding to these expected behaviors: T/VA-dependent' genes 
respond to FK506 (50 jig/ml) except when either calcineurin genes or 
FPR1 or both are deleted; 'GCAM-dependent' genes respond to FK506 
except when CCN4 is deleted. These genes still respond to FK506 
when calcineurin genes or FPRl or CPH1 are deleted; that is. their re- 
sponses are not mediated by calcineurin. Cphl, or Fprl. 'CAM- and 
GCAW-independent* genes respond to FK506 in all deletion strains 
tested. A 'complex behavior* class is provided for those genes that did 
not match the model of FK506 response mediated through cal- 
cineurin or Fprl or separately through Gcn4. 

penile erection. It is possible that application of the 'de- 
coder' to other compounds may show that they too have a 
potent activity against a target disUnct from their in- 
tended target. 

The ability to decode drug effects is dependent on the 
availability of functionally targetless' cells. In yeast, this 
is being achieved by systematically disrupting each yeast 
gene (Saccharomyces Deletion Consortium: http://se- 
quence-www.stanford.edu/group/yeast.deledon pro- 
ject/deletion.html). Efforts are underway to obtain 
■™f expression profiles from each deletion mutant strain. 
Determining signatures resulting from inactivation of es- 
sential genes presents a unique problem, but it may be 
possible to do so by examining heterozygotes or by using a con- 
trollable promoter to reduce expression of the essential gene 
Although it is already feasible to test several compounds in 
dozens of yeast strains, another challenge for the 'decoder' 
strategy will be the efficient selection of the mutants with dele- 
tions in genes most likely to encode the intended drug target. 
The signature correlation plots described are one metric that 
could be used as part of that selection process, but others need 
to be explored. Applying the decoder' to mammalian cells pre- 
sents additional challenges. It is considerably more difficult to 
isolate functionally targetless* cells. Strategies involving titrat- 
able promoters, known specific inhibitors, anti-sense RNAs ri- 
bozymes, and methods of targeting specific proteins for 
degradation are possible and should be tested. Another limita- 
tion is that not all cell types express the same set of genes and 
therefore off-target' effects may be different in different cell 
types. In addition, applying the decoder' to human cells will 
also require technical improvements that allow expression pro- 
filing from a small number of cells. Even the broader question 
of whether the insensitivity of off-target' signatures to the dis- 
ruption of the main target is the exception or the rule can only 
be answered by the accumulation of more data. Barkai and 
Leibler. however, have argued in favor of robustness of biologi- 
cal networks, indicating that drug perturbations ('off-targef 
signatures) may be robust even when the system is subjected to 
another perturbation (such as a genetic disruption) (ref. 28) 
Many practical developments will be necessary if the 'decoder' 
concept is to be broadly applied. 

Expression arrays have been used mainly as an initial screen 
for genes induced in a particular tissue or process of interest by 
focusing on genes with large expression ratios. We have 
found, however, that effort to refine experimental protocols 
and repeal experiments increases the reliability of the data and 
permits new applications. For example, it provides a larger set 
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Table 3 Yeast strains used 



Strain 

YPH499 

R563 

R558 

R567 

MCY300 

R132 

R133 

R559 

BY4719 

BY4738 

R491 

BY4728 

BY4729 

R1226 



Relevant genotype 

Mata ura3-52 Iys2-801 ade2-101 trp1-&63 his3-A200 teu2-A1 

Mata ura3'S2tys2'801 ade2-101 trpl>A63 h'*3-A20O leu2~61 tm3::HlS3 

Mata ura3S2 Iyt2-801 ede2-101 trphA63 his3-A200 Ieu2-Al fprl::HIS3 

Mata ura3-52lys2-801 ade2-101 trp1-A63 his3-A200 Ieu2-A1 cph1::H!S3 

Mata ura3-52iys2~801 ade2-W1 trphA63 his3-A200 Ieu2-Al cnaU1::hisGcna2A1::H!S3 

Mata ura3 52 Iyi2-801 ade2-W1 trpl-AB3 hi$3-A200 teu2-A1 cnaU1::hisGcna2A1::HIS3cph1::karf 

Mata ura3S2 Iys2-801 ade2-W1 trp1-A63 his3~A200 Ieu2-Al cnalAl::hisG cna2A1::HIS3 tpn.:kari 

Mata ura3 S2lys2-801 ade2-W1 trp1-A63 his3-6200 Ieu2-A1 hb3::HlS3 gcn4::LEU2 

Mata trp1-A63 ura3-&0 

Mata trpl-A 63 ura3-&0 

Mata/a BY4719 XBY4738 

Mata his3-&200 trp1-A63 ura3-A0 

Mata his3-A 200 ttp1-A63 ura3-A0 

Mata/a BY4728 XBY4729 



Reference 
(34) 

(this study) 
(this study) 
(this study) 
(21) 

(this study) 
(this study) 
(this study) 
(35) 
(35) 

(this study) 

(35) 

(35) 

(this study) 



of genes at higher confidence levels that serve as a more 
unique signature for a given protein perturbation. In addition, 
E it allows subtle signatures to be detected, when, for example, a 
8 protein is only partially inhibited. This may enable clinical 
| monitoring of small changes in protein function in disease or 
1 toxicity states before they could otherwise be detected. 
6 Because the functions of many genes detected on transcript ar- 
3 rays are known, these microarrays are powerful tools that pro- 
| vide detailed information about a cell's physiology. For 
=S example, changes in the flux through a metabolic pathway are 
*r reflected in transcriptional changes in genes in the pathway 7 . 

Furthermore, it may be possible to indirectly measure protein 
S activity levels from expression profiling data (S.F., et al, un- 
S published data). Thus, although the eventual development of 
© genomic methods allowing the direct measurement of all cel- 
< lular protein levels will be an important achievement, tran- 
§ script array technology offers an immediate and robust means 
« of evaluating the effects of various treatments on gene expres- 
eo sion and protein function. 

$ Methods 

Construction, growth and drug treatment of yeast strains. The strains 
used in this study (Table 3) were constructed by standard techniques'*. 
To construct strain R559. strain R563 was transformed to Leu* with plas- 
mid pM12 digested by Safl and MM (provided by A. Hinnebusch and T. 
Dever). Strains R132 and R133 were constructed by transforming the bac- 
terial kanamycin resistance cassette 10 flanked by genomic DNA from the 
CPM and FPR1 loci, respectively, and selecting for G418-resistant 
colonies. For experiments with FK506, cells were grown for three genera- 
tions to a density of 1 x 10' cells/ml in YAPD medium (YPD plus 0.004% 
adenine) supplemented with 10 mM catcium chloride as described 31 . 
Where indicated, FK506 was added to a final concentration of 1 pg/ml 
0.5 h after inoculation of the culture or to 50 ug/ml 1 h before cells were 
collected. CsA was used at a final concentration of 50 pg/ml. Cells were 
broken by standard procedures* 7 with the following modifications: Cell 
pellets were resuspended in breaking buffer (0.2 M Tris HCI pH 7.6. 0.5 M 
NaCI. 10 mM EDTA, 1% SDS). vortexed for 2 min on a VWR multi-tube 
vonexer at setting 8 in the presence of 60% glass beads (425-600 pm 
mesh; Sigma) and phenolxhloroform (50:50, volume/volume). After sep- 
aration of the phases, the aqueous phase was re-extracted and ethanol- 
precipitated. Poly A* RNA was isolated by two sequential 
chromatographic purifications over oligo dT cellulose (New England 
Biolabs, Beverly. Massachusetts) using established protocols". 

For experiments using 3-AT, wild-type or his3/his3 cells were grown to 
early logarithmic phase in SC medium, pelleted and resuspended in SC 
medium lacking hisiidine for 1 hr in the presence or absence of 10 mM 3- 



AT. as indicated. Cells were harvested and mRNA isolated as above. 
FK506 was obtained from the Swedish Hospital Pharmacy (Seattle, 
Washington) and purified to homogeneity by ethyl acetate extraction by 
J. Simon (Fred Hutchinson Cancer Research Center. Seattle, Washington). 
CsA was obtained from Alexis Biochemicals (San Diego, California); 3-AT 
was from Sigma. 

Preparation and hybridization of the labeled sample. Fluoresce ntly- la- 
beled cDNA was prepared, purified and hybridized essentially as de- 
scribed 7 . Cy3- or Cy5-dUTP (Amersham) was incorporated into cDNA 
during reverse transcription (Superscript II; Life Technologies) and puri- 
fied by concentrating to less than 10 |il using Microcon-30 mlcroconcen- 
trators (Amicon. Houston, Texas). Paired cDNAs were resuspended in 
20-26 pi hybridization solution (3 x SSC, 0.75 ug/ml polyA DNA, 0.2% 
SDS) and applied to the microarray under a 22- x 30- mm coverslip for 6 
h at 63 "C. all according to a published method 1 . 

Fabrication and scanning of microarrays. PCR products containing 
common 5' and 3* sequences (Research Genetics, Huntsville, Alabama) 
were used as templates with amino-modified forward primer and unmod- 
ified reverse primers to PCR amplify 6.065 ORFs from the 5. cerevisiae 
genome. Our first-pass success rate was 94%. Amplification reactions that 
gave products of unexpected sizes were excluded from subsequent analy- 
sis. ORFs that could not be amplified from purchased templates were am- 
plified from genomic DNA. DNA samples from 100-pl reactions were 
isopropanol-precipitated, resuspended in water, brought to a final con- 
centration of 3x SSC in a total volume of 1 5 \xl and transferred to 384- 
well microtiter plates (Genetix Limited. Christchurch. Dorset, England). 
PCR products were spotted onto 1 x 3-inch poly lysine- treated glass slides 
by a robot built essentially according to defined specifications 1 11 
(http://cmgm.stanford.edu/pbrown/MGuide). After being printed, slides 
were processed according to published protocols'. 

Microarrays were imaged on a prototype multi-frame CCD camera in 
development at Applied Precision (Issaquah, Washington). Each CCD 
image frame was approximately 2-mm square. Exposure times of 2 s in 
the Cy5 channel (white light through Chroma 618-648 nm excitation fil- 
ter, Chroma 657-727 nm emission filter) end 1 s in the Cy3 channel 
(Chroma 535-560 nm excitation filter. Chroma 570-620 nm emission fil- 
ter) were done consecutively in each frame before moving to the next, 
spatially contiguous frame. Color isolation between the Cy3 and CyS 
channels was about 100:1 or better. Frames were 'knitted' together in 
software to make the complete images. The intensity of spots (about 100 
urn) were quantified from the 10*um pixels by frame-by-frame back- 
ground subtraction and intensity averaging in each channel. Dynamic 
range of the resulting spot intensities was typically a ratio of 1,000 be- 
tween the brightest spots and the background-subtracted additive error 
level. Normalization between the channels was accomplished by normal- 
izing each channel to the mean intensities of all genes. This procedure is 
nearly equivalent to normalization between channels using the intensity 
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ratio of genomic DNA spots', but is possibly more robust, as it is based on 
the intensities of several thousand spots distributed over the array. 

Signature correlation coefficients and their confidence limits. 
Correlation coefficients between the signature ORFs of various experi- 
ments were calculated using: 

p.Ix.y./dx.'Xy/r" 
k k k 

where x, is the I6g„ of the expression ratio for the k w gene in the x signa- 
ture, and y, is the \og w of the expression ratio for the k* gene in the y sig- 
nature. The summation is over those genes that were either up- or 
down-regulated in either experiment at the 95% confidence level. These 
genes each had a less than 5% chance of being actually unregulated (hav- 
ing expression ratios departing from unity due to measurement errors 
alone). This confidence level was assigned based on an error model which 
assigns a lognormal probability distribution to each gene's expression 
ratio with characteristic width based on the observed scatter in its re- 
peated measurements (repeated arrays at the same nominal experimental 
conditions) and on the individual array hybridization quality. This latter 
dependence was derived from control experiments in which both Cy3 
and Cy5 samples were derived from the same RNA sample. For large 
numbers of repeated measurements the error reduces to the observed 
scatter. For a single measurement the error is based on the array quality 
and the spot intensity. 

Random measurement errors in the x and y signatures tend to bias the 
correlation towards zero. In most experiments, most genes are not signif- 
icantly affected but do show small random measurement errors. Selecting 
only the '95% confidence' genes for the correlation calculation, rather 
than the entire genome, reduces this bias and makes the actual biological 
correlations more apparent. 

Correlations between a profile and itself are unity by definition. Error 
limits on the correlation are 95% confidence limits based on the individ- 
ual measurement error bars, and assuming uncorrected errors". They do 
not include the bias mentioned above; thus, a departure of p from unity 
does not necessarily mean that the underlying biological correlation is im- 
perfect. However, a correlation of 0.7 ± 0.1, for example, is very signifi- 
cantly different from zero. Small (magnitude of p < 0.2) but formally 
significant correlation in the tables and text probably are due to smalt sys- 
tematic biases in the Cy5/Cy3 ratios that violate the assumption of inde- 
pendent measurement errors used to generate the 95% confidence 
limits. Therefore, these small correlation values should be treated as not 
significant. A likely source of uncorrected systematic bias is the partially 
corrected scanner detector nonlinearity that differently affects the Cy3 
and Cy5 detection channels. 

The 1 ug/ml FK506 treatment signature was compared with more 
than 40 unrelated deletion mutant strain or drug signatures. These con- 
trol profiles had correlation coefficients with the FK506 profile that were 
distributed around zero (mean p = -0.03) with a standard deviation of 
0.16 (data not shown), and none had correlations greater than p = 0.38. 
Similarly, the calcineurin mutant strain signature correlated well with the 
CsA treatment signature (p « 0.71 t 0.04) but not with the signatures 
from the negative controls (mean p = -0.02 with a standard deviation of 
0.18). 

Quality controls. End-to-end checks on expression ratio measurement 
accuracy were provided by analyzing the variance in repeated hybridiza- 
tions using the same mRNA labeled with both Cy3 and Cy5. and also 
using Cy3 and Cy5 mRNA samples isolated from independent cultures of 
the same nominal strain and conditions. Biases undetected with this pro- 
cedure, such as gene-specific biases presumably due to differential incor- 
poration of Cy3- and Cy5-dUTP into cDNA, were minimized by doing 
hybridizations in fiuor-reversed pairs, in which the Cy3/Cy5 labeling of 
the biological conditions was reversed in one experiment with respect to 
the other. The expression ratio for each gene is then the ratio of ratios be- 
tween the two experiments in the pair. Other biases are removed by algo- 
rithmic numerical de-trending. The magnitude of these biases in the 
absence of de-trending and fluor reversal is typically about 30% in the 
ratio, but may be as high as twofold for some ORFs. 
Expression ratios are based on mean intensities over each spot. Some 



smaller spots have fewer image pixels in the average. This does not de- 
grade accuracy noticeably until the number of pixels falls below ten. in 
which case the spot is rejected from the data set. 'Wander' of spot posi- 
tions with respect to the nominal grid is adaptively tracked in array sub- 
regions by the image processing software. Unequal spot 'wander* within 
a subregion greater than half-a-spot spacing is a difficulty for the auto- 
mated quantitating algorithms; in this case, the spot is rejected from 
analysis based on human inspection of the 'wander'. Any spots partially 
overlapping are excluded from the data set. Less than 1% of spots typi- 
cally are rejected for these reasons. 
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The Transcriptional Program in 
the Response of Human 
Fibroblasts to Serum 
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Creg Schuler, Troy Moore, Jeffrey C. F. Lee, Jeffrey M. Trent, 
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The temporal program of gene expression during a model physiological re- 
sponse of human cells, the response of fibroblasts to serum, was explored with 
a complementary DNA microarray representing about 8600 different human 
genes. Genes could be clustered into groups on the basis of their temporal 
patterns of expression in this program. Many features of the transcriptional 
program appeared to be related to the physiology of wound repair, suggesting 
that fibroblasts play a larger and richer role in this complex multicellular 
response than had previously been appreciated. 



The response of mammalian fibroblasts to 
serum has been used as a model for studying 
growth control and cell cycle progression (7). 
Normal human fibroblasts require growth 
factors for proliferation in culture; these 
growth factors are usually provided by fetal 
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bovine serum (FBS). In the absence of 
growth factors, fibroblasts enter a nondivid- 
ing state, termed G 0 , characterized by low 



metabolic activity. Addition of FBS or puri- 
fied growth factors induces proliferation of 
the fibroblasts; the changes in gene expres- 
sion that accompany this proliferative re- 
sponse have been the subject of many studies, 
and the responses of dozens of genes to se- 
rum have been characterized. 

We took a fresh look at the response of 
human fibroblasts to serum, using cDNA mi- 
croarrays representing about 8600 distinct hu- 
man genes to observe the temporal program of 
transcription that underlies this response. Pri- 
mary cultured fibroblasts from human neonatal 
foreskin were induced to enter a quiescent state 
by serum deprivation for 48 hours and then 
stimulated by addition of medium containing 
10% FBS {2). DNA microaTTay hybridization 
was used to measure the temporal changes in 
mRNA levels of 8613 human genes (3) at 12 
times, ranging from 15 min to 24 hours after 
serum stimulation. The cDNA made from pu- 
rified mRNA from each sample was labeled 
with the fluorescent dye Cy5 and mixed with a 
common reference probe consisting of cDNA 
made from purified mRNA from the quiescent 
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Fig. 1. The same section of 
the microarray is shown 
for three independent hy- 
bridizations comparing RNA 
isolated at the 8-hour time 
point after serum treat- 
ment to RNA from serum- 
deprived cells. Each mi- 
croarray contained 9996 
elements, including 9804 
human cDNAs, represent- 
ing 8613 different genes. 
mRNA from serum-de- 
prived cells was used to 
prepare cDNA labeled with 

Cy3-deoxyuridine triphosphate (dUTP), and mRNA harvested from cells at different times after serum 
stimulation was used to prepare cDNA labeled with Cy5-dUTP. The two cDNA probes were mixed and 
simultaneously hybridized to the microarray. The image of the subsequent scan shows genes whose 
mRNAs are more abundant in the serum-deprived fibroblasts (that is. suppressed by serum treatment) 
as green spots and genes whose mRNAs are more abundant in the serum-treated fibroblasts as red 
spots. Yellow spots represent genes whose expression does not vary substantially between the two 
samples. The arrows indicate the spots representing the following genes: 1, protein disulfide isomerase- 
related protein P5; 2. IL-8 precursor; 3. EST AA057170; and 4, vascular endothelial growth factor. 
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wlture"7fime zero) labeled with a second flucK 
resceni dye, Cy3 (4). The color images of the 
hybridization results (Fig. 1) were made by 
representing the Cy3 fluorescent image as 
green and the Cy5 fluorescent image as red and 
merging the two color images. 

Diverse temporal profiles of gene expres- 
sion could be seen among the 8613 genes sur- 



veyed in this experiment (Fig. 2); marry of these 
genes (about half) were unnamed expressed 
sequence tags (ESTs) (5). Although diverse 
patterns of expression were observed, the order- 
ly choreography of the expression program be- 
came apparent when the results were analyzed 
by a clustering and display method developed 
in our laboratory for analyzing genome-wide 



Fig. 2. Ouster image 
showing the different 
classes of gene expres- 
sion profiles. Five hun- 
dred seventeen genes 
whose mRNA levels 
changed in response to 
serum stimulation were 
selected (7). This sub- 
set of genes was clus- 
tered hierarchically into 
groups on the basis of 
the similarity of their 
expression profiles by 
the procedure of Eisen 
et ai {€), The expres- 
sion pattern of each 
gene in this set is dis- 
played here as a hori- 
zontal strip. For each 
gene, the ratio of 
mRNA levels in fibro- 
blasts at the indicat- 
ed time after senjm 
stimulation ("unsync" 
denotes exponentially 
growing cells) to its 
level in the serum-de- 
prived (time zero) fi- 
broblasts is represented 
by a color, according to 
the color scale at the 
bottom. The graphs 
show the average ex- 
pression profiles for the 
genes in the corre- 
sponding "cluster" (in- 
dicated by the letters A 
to J and color coding), 
(n every case examined, 
when a gene was rep- 
resented by more than 
one array element, the 
multiple representa- 
tions in this set were 
seen to have identical 
or very similar expres- 
sion profiles, and the 
profiles corresponding 
to these independent 
measurements clus- 
tered either adjacent 
or very dose to each 
other, pointing to the 
robustness of the clus- 
tering algorithm in 
grouping genes with 
very similar patterns of 
expression. 
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gene expression data (6). An example of such 
an ana ryslsT here applied to a subset of 517 
genes whose expression changed substantially 
in response to serum (7), is sh wn in Fig. 2. 
The entire detailed data set underlying Fig. 
2 is available as a tab-delimited table (in 
cluster order) at the Science Web site (www. 
sciencemag.org/feanjre/data/984559.shl). In 
addition, the entire, larger data set for the 
complete set of genes analyzed in this exper- 
iment can be found at a Web site maintained 
by our laboratory (genome-www.stanford. 
edu/serum) (8). 

One measure of the reliability of the 
changes we observed is inherent in the ex- 
pression profiles of the genes. For most genes 
whose expression levels changed, we could 
see a gradual change over a few time points, 
which thus effectively provided independent 
measurements for almost all of the observa- 
tions. An additional check was provided by 
the inclusion of duplicate and, in a few cases, 
multiple array elements representing the 
same gene for about 5% of the genes included 
in this microarray. In addition, three indepen- 
dent hybridizations to different microarrays 
with mRNA samples from cells harvested 8 
hours after serum addition showed good cor- 
relation (Fig. 1). As an independent test, we 
measured the expression levels of several 
genes using the TaqMan 5' nuclease fluori- 
genic quantitative polymerase chain reaction 
(PCR) assay (P). The expression profiles of 
the genes, as measured by these two indepen- 
dent methods, were very similar (Fig. 3) (70). 

The transcriptional response of fibroblasts 
to serum was extremely rapid. The immediate 
response to serum stimulation was dominated 
by genes that encode transcription factors 
and other proteins involved in signal trans- 
duction. The mRNAs for several genes [in- 
cluding c-FOS, JUN B, and mitogen-acti- 
vatcd protein (MAP) kinase phosphatase- 1 
(MKP1)J were detectably induced within 
15 min after serum stimulation (Fig. 4, A 
and B). Fifteen of the genes that were 
observed to be induced by serum encode 
known or suspected regulators of transcrip- 
tion (Fig. 4B). All but one were immediate- 
early genes — their induction was not inhib- 
ited by cycloheximide {J I). This class of 
genes could be distinguished into those 
whose induction was transient (Fig. 2, clus- 
ter E) and those whose mRNA levels re- 
mained induced for much longer (Fig. 2, 
clusters 1 and J). Some features of the 
immediate response appeared to be directed 
at adaptation to the initiating signals. We 
observed a marked induction of mRNA 
encoding MKP1, a dual-specificity phos- 
phatase that modulates the activity of the 
ERKI and ERK2 MAP kinases (72). The 
coincidence of the peak of expression of 
genes in cluster E (Fig. 2) with that of 
MKPI (Fig. 4A) suggests the possibility 
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that contmued~acti vity of the MAP kinase path-- 

way is required to maintain induction of these 
genes but not of those with sustained expression 
(clusters 1 and J). The gene encoding a second 
member of the dual-specificity MAP kinase 
phosphatase family, known as dual-specificity 
protein phosphatase 6/pyst2, was induced later, 
at about 4 hours after serum stimulation. Genes 
encoding diverse other proteins with roles in 
signal transduction, ranging from cell-surface 
receptors [for example, the sphingosine I- 
phosphate receptor (EDG-1), the vascular en- 
dothelial growth factor receptor, and the type II 
BMP receptor] to regulators of G-protein sig- 
naling (for example, NET 1 /pi 15 rho GEF) to 
DNA-binding transcription factors, were in- 
duced by serum (Fig. 4A). 

The reprogramming of the regulatory cir- 
cuits in response to serum involved not only 
induction of transcription factors but also re- 
duced expression of many transcriptional reg- 
ulators — some of which may play roles in 
maintaining the cells in G 0 or in priming 
them to react to wounding (Fig. 4C). Perhaps 
as a consequence of the historical focus on 
genes induced by serum stimulation of fibro- 
blasts, the set of transcription factors whose 
expression diminished upon serum stimula- 
tion has been less well characterized. 

Genes known or likely to be involved in 
controlling and mediating the proliferative re- 
sponse showed distinctive patterns of regula- 
tion. Several genes whose products inhibit pro- 
gression of the cell-division cycle, such as p27 
Kip I , p57 Kip2, and pi 8, were expressed in the 
quiescent fibroblasts and down-regulated be- 
fore the onset of cell division. The nadir in the 
mRNA levels for these genes occurred between 
6 and 12 hours after serum stimulation (Fig. 
5A), coincident with the passage of the fibro- 
blasts through G,. The levels of the transcript 
encoding the WEE 1 -like protein kinase, which 
is believed to inhibit mitosis by phosphoryl- 
ation of Cdc2, diminished between 4 and 8 to 
12 hours after serum addition (Fig. 5 A), well 



befonfthe onset of M phase at around 1 6hours, 
raising the possibility of an additional role for 
Weel in an earlier stage of the cell cycle or in 
regulating the G 0 to G, transition. Several 
genes induced in the first few hours after serum 
stimulation, such as the helix-loop-heiix pro- 
teins ID2 and ID3 and EST AA016305. a gene 
with homology to G,-S cyclins, are candidates 
for roles in promoting the exit from Gq. 

Genes involved in mediating progression 
through the cell cycle were characterized by a 
distinctive panern of expression (Fig. 2, clus- 
ter D), reflecting the coincidence of their 
expression with the reentry of the stimulated 
fibroblasts into the cell-division cycle. The 
stimulated fibroblasts replicated their DNA 
about 16 hours after serum treatment. This 
timing was reflected by the induction of 
mRNA encoding both subunits of ribonucle- 
otide reductase and PCNA, the processivity 
factor for DNA polymerase epsilon and delta. 
Cyclin A, Cyclin Bl, Cdc2, and CDC28 ki- 
nase, regulators of passage through the S 
phase and the transition from G 2 to M phase, 
were induced at about 16 to 20 hours after 
serum addition. The kinase in the Cyclin 
Bl-CDK pair needs to be activated by phos- 
phorylation. The gene encoding Cyclin-de- 
pendent kinase 7 (CDK7: a homolog of Xe- 
nopus M015 cdk-activating kinase) was in- 
duced in parallel with the Cdc2 and Cdc28 
kinases (Fig. 5A), suggesting a potential role 
for CDK7 in mediating M phase. DNA topo- 
isomerase II a, required for chromosome seg- 
regation at mitosis; Mad2, a component of 
the spindle checkpoint that prevents comple- 
tion of mitosis (anaphase) if chromosomes 
are not attached to the spindle; and the kinet- 
ochore protein CENP-F all showed a similar 
expression profile. 

In the hours after the scrum stimulus, one of 
the most striking feamres of the unfolding tran- 
scriptional program was the appearance of nu- 
merous genes with known roles in processes 
relevant to the physiology of wound healing. 



These included both genes involved in the di- 
rect role played by fibroblasts in remodeling of 
the clot and the extracellular matrix and, more 
notably, genes encoding proteins involved in 
intercellular signaling (Fig. 5). Genes induced 
in this program encode products that can (i) 
participate in the dynamic process of cloning, 
clot dissolution, and remodeling and perhaps 
contribute to hemostasis by promoting local 
vasoconstriction (for example, endothelin-I); 
(ii) promote chemotaxis and activation of neu- 
trophils (for example, COX2) and recruitment 
and extravasation of monocytes and macro- 
phages (for example, MCP1); (iii) promote 
chemotaxis and activation of T lymphocytes 
[for example, ihterleukin-8 (IL-8)] and B 
lymphocytes (for example, 1CAM-1), thus 
providing both innate and antigen-specific 
defenses against wound infection and recruit- 
ing the phagocytic cells that will be required 
io clear out the debris during remodeling of 
the wound; (iv) promote angiogenesis and 
neovascularization (for example, VEGF) 
through newly forming tissue; (v) promote 
migration and proliferation of fibroblasts (for 
example. CTGF) and their differentiation into 
myofibroblasts (for example, Vimentin); and 
(vi) promote migration and proliferation of 
keratinocytes, leading to reepithelialization 
of the wound (for example, FGF7), and pro- 
mote proliferation of melanocytes, perhaps 
contributing to wound hyperpigmentation 
(for example, FGF2). 

Coordinated regulation of groups of genes 
whose products act at different steps in a 
common process was a recurring theme. For 
example, Furin, a prohormone-processing 
protease required for one of the processing 
steps in the generation of active endothelin, 
was induced in parallel with induction of the 
gene encoding the precursor of endothelin- 1 
(Fig. 5E) (73). Conversely, expression of 
CALLA/CDIO. a membrane mctalloprotcase 
that degrades endothelin- 1 and other peptide 
mediators of acute inflammation, was re- 
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Fig. 3. Independent verification of miooarray quantitation. Relative mRNA 
levels of the indicated genes (Man, mast/stem cell growth factor receptor) 
were measured with the TaqMan 5' nuclease fluorigenic quantitative PCR 
assay (9) (left) in the same samples that were used to prepare probes for 
microarray hybridizations (right). Data from the TaqMan analysis were 



normalized to mRNA concentrations and plotted relative to the level at 
time zero, so that the results could be compared with those from the 
microarray hybridizations. In general quantitation with the two methods 
gave very similar results (70). 
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duced. A second example is provided by a set 
of five genes involved in the biosynthesis of 
cholesterol (Fig. 51). The mRNAs encoding 
each of these enzymes showed sharply dimin- 
ished expression beginning 4 to 6 hours after 
serum stimulation of fibroblasts, A likely ex- 
planation for the coordinated down-regula- 
tion of the cholesterol biosynthetic pathway 
is that scrum provides cholesterol to fibro- 
blasts through low-density lipoproteins, 
whereas in the absence of the cholesterol 
provided by serum, endogenous cholesterol 
biosynthesis in fibroblasts is required. 

Many of the previously studied genes that 
we observed to be regulated in this program 
have no recognized role in any aspect of wound 
healing or fibroblast proliferation. Their identi- 
fication in this study may therefore point to 
previously unknown aspects of these processes. 
A few selected genes in this group are shown in 
Fig. 5H. The stanniocalcin gene, for example 
(Fig. 5H), encodes a secreted protein without a 
clearly identified function in human cells {14, 
15). Its induction in serum-stimulated fibro- 
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Fig. 4. "Reprogramming" of fibroblasts. Expres- 
sion profiles of genes whose function is likely to 
play a role in the reprogramming phase of the 
response are shown with the same representa- 
tion as in Fig. 2. In the cases in which a gene 
was represented by more than one element in 
the microarray, all measurements are shown. 
The genes were grouped into categories on the 
basis of our knowledge of their most likely role. 
Some genes with pleiotropic roles were includ- 
ed in more than one category. 



blasts suggests the possibility that it may play a 
role in the wound-healing process, perhaps 
serving as a signal in mediating inflammation 
or angiogenesis. 

One of the most important results of this 
exploration was the discovery of over 200 pre- 
viously unknown genes whose expression was 
regulated in specific temporal patterns during 
the response of fibroblasts to scrum. For exam- 
ple, 1 3 of the 40 genes in cluster D (Fig. 2) have 
descriptive names that reflect their putative 
function. Nine of these 13 genes (69%) encode 
proteins that play roles in cell cycle progres- 
sion, particularly in DNA replication and the 
G 2 -M transition. This enrichment for cell 
cycle-related genes suggests that some of the 



unnamed genes in this cluster— for example, 
EST W79311 and EST R13I46\ neither of 
which have sequence similarity to previously 
characterized genes— may represent previously 
unknown genes involved in this pan of the cell 
cycle. Similarly, a remarkable fraction of genes 
that were grouped into cluster F on the basis of 
their expression profiles encoded proteins in- 
volved in intercellular signaling (Fig. 2), sug- 
gesting that a similar role should be considered 
for the many unnamed genes in this cluster. A 
disproportionately large fraction of the genes 
whose transcription diminished upon serum 
stimulation were unnamed ESTs. 

Our intention was to use this experiment as 
a model to study the control of the transition 
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Fig. 5 The transcriptional response to serum suggests a multifaceted role for fibroblasts in the 
phys.ology of wound healing. The features of the transcriptional program of fibroblasts in response 
toserum stimulation that appear to be related to various aspects of the wound-healing process and 
fibroblast proliferation are shown with the same convention for representing changes in transcript 
I^JS:^ 7^ s - 2 ™ d «■ W ™ l and proliferation, (B) coaguLon !nd hemosta £ 
(C) inflammation, D angiogenesis, (E) tissue remodeling, (F) cytoskeletal reorganiiation (C 
reepithel.al.zat.on. (H unidentified role in wound heating, and (I) cholesterol biosynthesis ' The 
numbers in (C) and (C) refer to genes whose products serve as signals to neutrophils fCl) 
monocytes and macrophages (C2), T lymphocytes (C3), B lymphocytes (C4), and melanocytes CI) 
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from G 0 to a proliferating state. However, one 
of the defining characteristics of genome-scale 
expression profiling experiments is that the ex- 
amination of so many diverse genes opens a 
window on all the processes that actually occur 
and not merely the single process one intended 
to observe. Serum, the soluble fraction of clot* 
ted blood, is normally encountered by cells in 
vivo in the context of a wound. Indeed, the 
expression program that we observed in re- 
sponse to serum suggests that fibroblasts arc 
programmed to interpret the abrupt exposure to 
serum not as a general mitogenic stimulus but 
as a specific physiological signal, signifying a 
wound. The proliferative response that we orig- 
inally intended to study appeared to be pan of a 
larger physiological response of fibroblasts to a 
wound. Other features of the transcriptional 
response to serum suggest that the fibroblast is 
an active participant in a conversation among 
the diverse cells that work together in wound 
repair, interpreting, amplifying, modifying, and 
broadcasting signals controlling inflammation, 
angiogenesis, and epithelial regrowth during 
the response to an injury. 

We recognize that these in vitro results 
almost certainly represent a distorted and in- 
complete rendering of the normal physiolog- 
ical response of a fibroblast to a wound. 
Moreover, only the responses elicited directly 
by exposure of fibroblasts to scrum were 
examined. The subsequent signals from other 
cellular participants in the normal wound- 
healing process would certainly provoke fur- 
ther evolution of the transcriptional program 
in fibroblasts at the site of a wound, which 
this experiment cannot reveal. Nevertheless, 
we believe that the picture that emerged 
strongly suggests a much larger and richer 
role for the fibroblast in the orchestration of 
this important physiological process than had 
previously been suspected. 
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Systematic variation in gene expression 
patterns in human cancer cell lines 

Douglas T. Ross 1 , Uwe Scherf 5 , Michael B. Eisen 2 , Charles M. Perou 2 , Christian Rees 2 , Paul Spellman 2 , 
Vishwanath Iyer 1 , Stefanie S. Jeffrey 3 , Matt Van de Rijn 4 , Mark Waltham 5 , Alexander Pergamenschikov 2 , 
Jeffrey CF. Lee 6 , Deval Lashkari 7 , Dari Shalon 6 , Timothy G. Myers 8 , John N. Weinstein 5 , David Botstein 2 
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We used cDNA microarrays to explore the variation in expression of approximately 8,000 unique genes among the 
60 cell lines used in the National Cancer Institute's screen for anti-cancer drugs. Classification of the cell lines based 
solely on the observed patterns of gene expression revealed a correspondence to the ostensible origins of the 
tumours from which the cell lines were derived. The consistent relationship between the gene expression patterns 
and the tissue of origin allowed us to recognize outliers whose previous classification appeared incorrect. Specific 
features of the gene expression patterns appeared to be related to physiological properties of the cell lines, such 
as their doubling time in culture, drug metabolism or the interferon response. Comparison of gene expression pat- 
terns in the cell lines to those observed in normal breast tissue or in breast tumour specimens revealed features of 
the expression patterns in the tumours that had recognizable counterparts in specific cell lines, reflecting the 
tumour, stromal and inflammatory components of the tumour tissue. These results provided a novel molecular 
characterization of this important group of human cell lines and their relationships to tumours in vivo. 



Introduction 

CeD lines derived from human tumours have been extensively used 
as experimental models of neoplastic disease. Although such cell 
lines differ from both normal and cancerous tissue, the inaccessi- 
bility of human tumours and normal tissue makes it likely that 
such cell lines will continue to be used as experimental models for 
the foreseeable future. The National Cancer Instituted Develop- 
mental Therapeutics Program (DTP) has carried out intensive 
studies of 60 cancer cell lines (the NC160) derived from tumours 
from a variety of tissues and organs 1 " 4 . The DTP has assessed many 
molecular features of the cells related to cancer and chemothera- 
peutic sensitivity, and has measured the sensitivities of these 60 cell 
lines to more than 70,000 different chemical compounds, includ- 
ing all common chemotherapeutics (http://dtp.nci.nih.gov). A 
previous analysis of these data revealed a connection berween the 
pattern of activity of a drug and its method of action. In particular, 
there was a tendency for groups of drugs with similar patterns of 
activity to have related methods of anion 3,5 " 7 . 

We used DNA microarrays to survey the variation in abun- 
dance of approximately 8,000 distinct human transcripts in these 
60 cell lines. Because of the logical connection between the func- 
tion of a gene and its pattern of expression, the correlation of gene 
expression patterns with the variation in the phenotype of the cell 
can begin the process by which the function of a gene can be 
inferred. Similarly, the patterns of expression of known genes can 



reveal novel phenotypic aspects of the cells and tissues studied 8 " 10 . 
Here we present an analysis of the observed patterns of gene 
expression and their relationship to phenotypic properties of the 
60 cell lines. The accompanying report 11 explores the relationship 
berween the gene expression pa nerns and the drug sensitivity pro- 
files measured by the DTP. The assessment of gene expression pat- 
terns in a multitude of cell and tissue types, such as the diverse set 
of cell lines we studied here, under diverse conditions in vitro and 
in vivo, should lead to increasingly detailed maps of the human 
gene expression program and provide clues as to the physiological 
roles of uncharacterized genes 11-16 . The databases, plus tools for 
analysis and visualization of the data, are available (http^/genome- 
www.stanford.edu/nci60 and http://discover.nci.nih.gov). 

Results 

We studied gene expression in the 60 cell lines using DNA 
microarrays prepared by robotically spotting 9,703 human 
cDNAs on glass microscope slides 17,18 . The cDNAs included 
approximately 8,000 different genes: approximately 3,700 repre- 
sented previously characterized human proteins, an additional 
1,900 had homologues in other organisms and the remaining 
2,400 were identified only by ESTs. Due to ambiguity of the iden- 
tity of the cDNA clones used in these studies, we estimated that 
approximately 80% of the genes in these experiments were cor- 
rectly identified. The identities of approximately 3,000 cDNAs 
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Fig. 1 Gene expression patterns related to the tissue of origin of the cell lines. Two-dimen- 
sional h.erarchical clustering was applied to expression data from a set of 1 161 cDNAs 
measured across 64 cell lines. The 1,161 cDNAs were those (of 9,703 total) with transcript 
levels that varied by at least sevenfold (log 2 (ratio) >2.8) relative to the reference pool in at 
least 4 of 60 cell lines. This effectively selected genes with the greatest variation in expres- 
sion level across the 60 cell lines (including those genes not well represented in the refer- 
ence pool), and therefore highlighted those gene expression patterns that best 
distinguished the cell lines from one another. Data from 64 hybridizations were used one 
for each cell line plus the two additional independent representations of each of the cell 
lines K562 and MCF7. The two cell lines represented in triplicate were correspondingly 
weighted for the gene clustering so that each of the 60 cell lines contributed equally to the 
clustering, a. The cell-line dendrogram, with the terminal branches coloured to reflect the 
ostensible tissue of origin of the cell line (red, leukaemia; green, colon; pink breast; pur- 
ple, prostate; light blue, lung; orange, ovarian; yellow, renal; grey, CNS; brown, melanoma- 
black, unknown (NCI/ADR-RE5)). The scale to the right of the dendrogram depicts the cor- 
relation coefficient represented by the length of the dendrogram branches connectinq 
pairs of nodes. Note that the two triplets of replicated cell lines (K562 and MCF7) cluster 
tightly together and were well differentiated from even the most closely related cell lines 
mdicat.ng that this clustering of cell lines is based on characteristic variations in their qene 
expression patterns rather than artefacts of the experimental procedures b a coloured 
representation of the data table, with the rows (genes) and columns (cell lines) in cluster 
order. The dendrogram representing hierarchical relationships between genes was omit- 
ted for clarity, but is available (http://genome-www.Hanford.edu/nci60). The colour in each 
cell of this table reflects the mean-adjusted expression level of the gene (row) and cell line 
(column). The colour Kale used to represent the expression ratios is shown The labels 
3e-3d' in (b) refer to the clusters of genes shown in detail in Fig. 3. 
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from these experiments have been sequence-verified, including 
all of those referred to here by name. 

Each hybridization compared Cy5-labelled cDNA reverse tran- 
scribed from mRNA isolated from one of the cell lines with Cy3- 
labelled cDNA reverse transcribed from a reference mRNA 
sample. This reference sample, used in all hybridizations, was 
prepared by combining an equal mixture of mRNA from 12 of 
the cell lines (chosen to maximize diversity in gene expression as 
determined primarily from two-dimensional gel studies 2 ). By 
comparing cDNA from each cell line with a common reference, 
variation in gene expression across the 60 cell lines could be 
inferred from the observed variation in the normalized Cy5/Cy3 
ratios across the hybridizations. 

To assess the contribution of artefactual sources of variation in 
the experimentally measured expression patterns, K562 and 
MCF7 cell lines were each grown in three independent cultures, 
and the entire process was carried out independently on mRNA 
extracted from each culture. The variance in the triplicate fluo- 
rescence ratio measurements approached a minimum when the 
fluorescence signal was greater than approximately 0.4% of the 
measurable total signal dynamic range above background in 
either channel of the hybridization. We selected the subset of 
spots for which significant signal was present in both the numer- 
ator and denominator of the ratios by this criterion to identify 
the best-measured spots. The pair-wise correlation coefficients 
for the triplicates of the set of genes that passed this quality con- 
trol level (6,992 spots included for the MCF7 samples and 6,161 
spots for K562) ranged from 0.83 to 0.92 (for graphs and details, 
see http://genome-www.stanford.edu/nci60). 

To make the orderly features in the data more apparent, we used 
a hierarchical clustering algorithm 1 ^ and a pseudo-colour visu- 



alization matrix 3 - 21 . The object of the clustering was to group cell 
lines with similar repertoires of expressed genes and to group 
genes whose expression level varied among the 60 ceil lines in a 
similar manner. Clustering was performed twice using different 
subsets of genes to assess the robustness of the analysis. In one case 
(Fig. 1), we concentrated on those genes that showed the most 
variation in expression among the 60 cell lines (1,167 total). A sec- 
ond analysis (Fig. 2) included all spots that were thought to be well 
measured in the reference set (6,831 spots). 

Gene expression patterns related to the histologic 
origins of the cell lines 

The most notable property of the clustered data was that cell lines 
with common presumptive tissues of origin grouped together 
(Figs \a and 2). Cell lines derived from leukaemia, melanoma, 
central nervous system, colon, renal and ovarian tissue were clus- 
tered into independent terminal branches specific to their respec- 
tive organ types with few exceptions. Cell lines derived from 
non-small lung carcinoma and breast tumours were distributed 
in multiple different terminal branches suggesting that their gene 
expression patterns were more heterogeneous. 

Many of these coherent cell line clusters were distinguished by 
the specific expression of characteristic groups of genes 
(Fig. 3o-W). For example, a cluster of approximately 90 genes was 
highly expressed in the melanoma-derived lines (Fig. 3c). This set 
was enriched for genes with known roles in melanocyte biology, 
including tyrosinase and dopachrome tautomerase (TYR and 
DCT; two subunits of an enzyme complex involved in melanin 
synthesis- 2 ), MARTI (MLANA; which is being investigated as a 
target for immunotherapy of melanoma 23 ) and SlOO-fJ (SJ00B; 
which has been used as an antigenic marker in the diagnosis of 
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ttg. 2 Gene expression patterns related to 
other eel Mine p he no types, s t We applied 
two-dimensional hierarchical clustering to 
expression data from a set of 6,831 cONAs 
measured across the 64 cell lines. The 6,831 
cDNAs were those with a minimum fluores- 
cence signal intensity of approximately 0.4% 
of the dynamic range above background in 
the reference channel in each of the six 
hybridizations used to establish reproducibil- 
ity. This effectively selected those spots that 
provided the most reliable ratio measure- 
ments and therefore identified a subset of 
genes useful for exploring patterns comprised 
of those whose variation in expression across 
the GO cell lines was of moderate magnitude. 
b, Cluster-ordered data table, c, Doubling 
time of cell lines. Cell lines are given in cluster 
order. Values are plotted relative to the mean. 
Doubling times greater than the mean are 
shown in green, those with doubling time less 
than the mean are shown in red. d. Three 
related gene clusters that were enriched for 
genes whose expression level variation was 
correlated with cell line proliferation rate. 
Each of the three gene clusters (clustered 
solely on the basis of their expression pat* 
terns) showed enrichment for sets of genes 
involved in distinct functional categories (for 
example, ribosomal genes versus genes 
involved in pre-RNA splicing), e, Gene cluster 
in which all charactered and sequence-veri- 
fied cDNAs encode genes known to be regu- 
lated by interferons, f, Gene cluster enriched 
for genes that have been implicated in drug 
metabolism (indicated by asterisks). A further 
property of the gene clustering evident here 
and in Fig. 2 is the strong tendency for redun- 
dant representations of the same gene to 
cluster immediately adjacent to one another, 
even within larger groups of genes with very 
similar expression patterns. In addition to 
illustrating the reproducibility and consis- 
tency of the measurements, and providing 
independent confirmation of many of our 
measurements, this property also demon- 
strates that these, and probably all, genes 
have nearly unique patterns of variation 
across the 60 cell lines. If this were not the 
case, and multiple genes had identical pal- 
terns of variation, we would not ex pea to be 
able to distinguish, by clustering on the basis 
of expression variation, duplicate copies of 
individual genes from the other genes with 
ide nt ica I ex pr ession patt er ns. 
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melanoma). LOXJMVI, the seventh line designated as melanoma 
in the NCI60, did not show this characteristic partem. Although 
isolated from a patient with melanoma, LOXJMVI has previously 
been noted to lack melanin and other markers useful for identifi- 
cation of melanoma cells 1 . 

Paradoxically, two related cell lines (MDA-MB435 and MDA- 
N), which were derived from a single patient with breast cancer 
and have been conventionally regarded as breast cancer cell lines, 
shared expression of the genes associated with melanoma. MDA- 
MB435 was isolated from a pleural effusion in a patient with 
metastatic ductal adenocarcinoma of the breast 24 * 25 . It remains 
possible that the origin of the cell line was a breast cancer, and that 
its gene expression pattern is related to the neuroendocrine fea- 
tures of some breast cancers 26 . But our results suggest that this cell 
line may have originated from a melanoma, raising the possibility 
that the patient had a co-existing occult melanoma. 

The higher-level organization of the cell-line tree — in which 
groups span cell lines from different tissue types — also reflected 
shared biological properties of the tissues from which the cell 
lines were derived. The carcinoma -derived cell lines were divided 
into major branches that separated those that expressed genes 
characteristic of epithelial cells from those that expressed genes 
more typical of stromal cells. A cluster of genes is shown (Fig. 5b) 
that is most strongly expressed in cell lines derived from colon 
carcinomas, six of seven ovarian -derived cell lines and the two 
breast cancer lines positive for the oestrogen receptor. The named 
genes in this cluster have been implicated in several aspects of 
epithelial cell biology 27 . The cluster was enriched for genes whose 
products are known to localize to the basolateral membrane of 
epithelial cells, including those encoding components of 
adherens complexes (for example, desmoplakin (DSP), 
periplakin (PPL) and plakoglobin (JUP)), an epithelial- 
expressed cell-cell adhesion molecule (M4S1) and a sodium/ 
hydrogen ion exchanger 28-3 1 (SLC9A1). It also contained genes 
that encode putative transcriptional regulators of epithelial mor- 
phogenesis, a human homologue of a Drosophila melanogaster 
epithelial-expressed tumour suppressor (LLGL1) and a homeo- 
box gene thought to control calcium-mediated adherence in 
epithelial cells 32 - 33 (MSX2). 

In contrast, a separate, major branch of the cell-line dendro- 
gram (Fig. la) included all glioblastoma -derived cell lines, all 
renal-cell -carcinoma-derived cell lines and the remaining carci- 
noma-derived lines. The characteristic set of genes expressed in 
this cluster included many whose products are involved in stro- 
mal cell functions (Fig. 3d). Indeed, the two cell lines originally 
described as 'sarcoma-like 1 in appearance (Hs578T, breast carci- 
nosarcoma, and SF539, gliosarcoma) expressed most of these 
genes 34 * 35 . Although no single gene was uniformly characteristic 
of this cluster, each cell line showed a distinctive pattern of 
expression of genes encoding proteins with roles in synthesis or 
modification of the extracellular matrix (for example, caldesmon 
(CALD1), cathepsins, thrombospondin (THBS), lysyl oxidase 
(LOX) and collagen subtypes). Although the ovarian and most 
non-small-cell- lung-derived carcinomas expressed genes charac- 
teristic of both epithelial cells and stromal cells, they probably 
clustered with the CNS and renal cell carcinomas in this analysis 
because genes characteristically expressed in stromal ceUs were 
more abundantly represented in this gene set. 

Physiological variation reflected 
in gene expression patterns 

A cluster diagram of 6,831 genes (Fig. 2) is useful for exploring 
clusters of genes whose variation in mRNA levels was not obvi- 
ously attributable to cell or tissue type. We identified some gene 
clusters that were enriched for genes involved in specific cellular 



processes; the variation in their expression levels may reflect cor- 
responding differences in activity of these processes in the cell 
lines. For example, a cluster of 1,159 genes (Fig. 2a) included 
many whose products are necessary for progression through the 
cell cycle (such as CCNA1, MCM106 and MAD2L1), RNA pro- 
cessing and translation machinery (such as RNA helicases, 
hnRNPs and translation elongation factors) and traditional 
pathologic markers used to identify proliferating cells (MK167). 
Within this large cluster were smaller clusters enriched for genes 
with more specialized roles. One cluster was highly enriched for 
numerous ribosomal genes, whereas another was more enriched 
for genes encoding RNA-splicing factors. The variation in 
expression of these ribosomal genes was significantly correlated 
with variation in the cell doubling time (correlation coefficient of 
0.54), supporting the notion that the genes in this cluster were 
regulated in relation to cell proliferation rate or growth rate in 
these cell lines. 

In a smaller gene cluster (Fig. 2d), all of the named genes were 
prevjously known to be regulated by interferons 1 3 -*. Additional 
groups of interferon-regulated genes showed distinct patterns of 
expression (data not shown), suggesting that the NCI60 cell lines 
exhibited variation in activity of interferon-response pathways, 
which was reflected in gene expression patterns 36 . 

Another cluster (Fig. 2e) contained several genes encoding 
proteins with possible interrelated roles in dmg metabolism, 
including glutamate-cysteine Iigase (GLCLC, the enzyme respon- 
sible for the rate limiting step of glutathione synthesis), thiore- 
doxin (TXN) and thioredoxin reductase (TXNRD1; enzymes 
involved in regulating redox state in cells), and MRP1 (a drug 
transporter known to efficiently transport glutathione-conju- 
gated compounds 37 ). The elevated expression of this set of genes 
m a subset of these cell lines may reflect selection for resistance to 
chemotherapeutics. 

Cell lines facilitate interpretation of gene expression 
patterns in complex clinical samples 

Like many other types of cancer, tumours of the breast typically 
have a complex histological organization, with connective tissue 
and leukocytic infiltrates interwoven with tumour cells. To 
explore the possibility that variation in gene expression in the 
tumour cell lines might provide a framework for interpreting the 
expression patterns in tumour specimens, we compared RNA 
isolated from two breast cancer biopsy samples, a sample of nor- 
mal breast tissue and the NC160 cell lines derived from breast 
cancers (excluding MDA-MB-435 and MDA-N) and leukaemias 
(Fig. 4). This clustering highlighted features of the gene expres- 
sion pattern shared between the cancer specimens and individual 
cell lines derived from breast cancers and leukaemias. 

The genes encoding keratin 8 (KRT8) and keratin 19 (KRT19), 
as well as most of the other •epithelial' genes defined in the com- 
plete NC160 cell line cluster, were expressed in both of the biopsy 
samples and the two breast-derived ceD lines, MCF-7 and T47D, 
expressing the oestrogen receptor, suggesting that these tran- 
scripts originated in tumour cells with features similar to those of 
luminal epithelial cells (Fig. 5a). Expression of a set of genes char- 
acteristic of stromal cells, including collagen genes (COL3A), 
COLS AI and COL6A1) and smooth muscle cell markers 
(TAGLN), was a feature shared by the tumour sample and the 
stromal-like cell lines Hs578T and BT549 (Fig. 5b). This feature 
of the expression panern seen in the tumour samples is likely to 
be due to the stromal component of the tumour. The tumours 
also shared expression of a set of genes (Fig. 5c) with the multiple 
myeloma cell line (RPMI-8226), notably including 
immunoglobulin genes, consistent with the presence of B cells 
in the tumour (this was confirmed by staining with anti- 
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immunoglobulin antibodies; data noi shown). Therefore, dis- 
tinct sets of genes with co- varying expression among the samples 
(Fig. 4, arrow) appear to represent distinct ceU types that can be 
distinguished in breast cancer tissue. A fourth cluster of genes, 
more highly expressed in all of the cell lines than in any of the 
clinical specimens, was enriched for genes present in the 'prolif- 
eration* cluster described above (Fig. 5d). The variation in 
expression of these genes likely paralleled the difference in prolif- 
eration rate between the rapidly cycling cultured cell lines and the 
much more slowly dividing cells in tissues. 

Discussion 

Newly available genomics tools allowed us to explore variation in 
gene expression on a genomic scale in 60 cell lines derived from 
diverse tumour tissues. We used a simple cluster analysis to iden- 
tify the prominent features in the gene expression patterns that 
appeared to reflect 'molecular signatures' of the tissue from 
which the cells originated. The histological characteristics of the 
cell lines that dominated the clustering were pervasive enough 
that similar relationships were revealed when alternative subsets 
of genes were selected for analysis. Additional features of the 
expression pattern may be related to variation in physiological 
attributes such as proliferation rate and activity of interferon- 
response pathways. 

The properties of the tumour- derived cell lines in this study 
have presumably all been shaped by selection for resistance to 
host defences and chemotherapeutics and for rapid proliferation 
in the tissue culture environment of synthetic growth media, fetal 
bovine serum and a polystyrene substratum. But the primary 
identifiable factor accounting for variation in gene expression 
patterns among these 60 cell lines was the identity of the tissue 
from which each cell line was ostensibly derived. For most of the 
cell lines we examined, neither physiological nor experimental 
adaptation for growth in culture was sufficient to overwrite the 
gene expression programs established during differentiation in 
vivo. Nevertheless, the prominence of mesenchymal features in 
the cell lines isolated from glioblastomas and carcinomas may 
reflect a selection for the relative ease of establishment of cell 
lines expressing stromal characteristics, perhaps combined with 
physiological adaptation to tissue culture conditions 38 ^ 40 . 



F19. 4 Comparison of the gen* eipresswnpatterns in clinic I breast cancer 
spec.mens and cultured breast cancer and leukaemia cell lines. *. TwtMJimen- 
Sional h.erarchical clustering applied to gene expression data for two breast 
cancer spec.mens, a lymph node metastasis from one patient, normal breast 
and the NCI60 breast and leukaemia-derived cell lines. The gene expression 
data from tissue specimens was clustered along with expression data from a 
subset of the NCI60 cell lines to explore whether features of expression pat- 
terns observed in specific lines could be identified in the tissue samples labels 
md.cate gene clusters (shown in detail in Fig. 5) that may be related to specific 
cellular components of the tumour specimens, b. Breast cancer specimen 16 
sta.ned with anti-keratin antibodies, showing the complex mix of cell types 
characteristically found in breast tumours. The arrows highlight the different 
cellular components of this tissue specimen that were distinguished by the 
gene expression cluster analysis (Fig. 5). 



Biological themes linking genes with related expression pat- 
terns may be inferred in many cases from the shared attributes of 
known genes within the clusters. Uncharacterized cDNAs are 
likely to encode proteins that have roles similar to those of the 
known gene products with which they appear to be co-regulated. 
Still, for several clusters of genes, we were unable to discern a com- 
mon theme linking the identified members of the cluster. Further 
exploration of their variation in expression under more diverse 
conditions and more comprehensive investigation of the physiol- 
ogy of the NC160 cells may provide insight 10 . The relationship of 
the gene expression patterns to the drug sensitivity patterns mea- 
sured by the DTP is an example of linking variation in gene 
expression with more subtle and diverse phenotypic variation 1 

The patterns of gene expression measured in the NCI60 cell 
lines provide a framework that helps to distinguish the cells that 
express specific sets of genes in the histologically complex breast 
cancer specimens 41 . Although it is now feasible to analyse gene 
expression in micro-dissected tumour specimens 42 ' 43 , this obser- 
vation suggests that it will be possible to explore and interpret 
some of the biology of clinical tumour samples by sampling them 
intact. As is useful in conventional morphological pathology, one 
might be able to observe interactions between a tumour and its 
microenvironment in this way. These relationships will be clari- 
fied by suitable analysis of gene expression patterns from intact as 
well as dissected tumours 12,14,15 - 41 . 

Methods 

cDNA clones. We obtained the 9,703 human cDNA clones (Research Genet- 
ics) used in these experiments as bacterial colonies in 96-weU microtitre 
plates 9 . Approximately 8,000 distinct Unigene clusters (representing nomi- 
nally unique genes) were represented in this set of clones. All genes identi- 
fied here by name represent clones whose identities were confirmed by re- 
sequencing, or by the criteria that two or more independent cDNA clones 
ostensibly representing the same gene had nearly identical gene expression 
patterns. A single-pass 3' sequence re- verification was attempted for every 
clone after re-streaking for single colonies. For a subset of genes for which 
quality 3' sequence was not obtained, we attempted to confirm identities by 
5* sequencing. Of the subset of clones selected for 5' sequence verification 
on the basis ofan inieresting pattern of expression (888 total), 331 were cor- 
rectly identified, 57, incorrectly identified, and 500, indeterminate (poor 
quality sequence). We estimated that 1 5%-20% of array elements contained 
DNA representing more than one clone per well. So far, the identities of 
-3,000 clones have been verified. The full list of clones used and their norni- 
nal identities are available (gene names preceded by the designation M SID#" 
(Stanford Identification) represent clones whose identities have not yet been 
verified; http://genome-www.sianford.edu:8000/nci60). 

Production of cDNA microarrays. The arrays used in this experiment were 
produced at Synteni inc. (now Incyte Pharmaceuticals). Each insert was 
amplified from a bacterial colony by sampling 1 ul of bacterial media and 
performing PCR amplification of the insert using* consensus primers for 
the three plasmids represented in the clone set ( 5 '-TTGTAAAACG ACG 
GCCAGTG-3', 5 -CACACAGGAAACAGCTATG-3'). Each PCR product 
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(100 mJ) was purified by gel exclusion, concentrated and suspended in 
3xSSC (10 nl). The PCR products were then printed on treated glass 
microscope slides using a robot with four printing tips. Detailed protocols 
for assembling and operating a microarray printer, and printing and exper- 
imental application of DNA microarrays are available (http;//cmgm. 
stanford.edu/pbrown). 

Preparation of mRNA and reference pool. Cell lines were grown from NCI 
DTP frozen stocks in RPMI-J640 supplemented with phenol red, glutamine 
(2 mM) and 5% fetal calf serum. To minimize the contribution of variations 
in culture conditions or cell density to differential gene expression, we grew 
each ceD line to 80% confluence and isolated mRNA 24 h after transfer to 
fresh medium. The time between removal from the incubator and lysis of the 
cells in RNA stabilization buffer was minimized ( < 1 min). CeUs were lysed in 
buffer containing guanidium isothiocyanate and total RNA was purified 
with the RNeasy purification kit (Qiagen). We purified mRNA as needed 



using a polyi A) punficat.on kit (Oligotexr^iagen) according to the manu- 
facturers instructions. Denaturing agarose gd electrophoresis assessed the 
integrity and relative contamination of mRNA with ribosomaj RNA. 

TTie breast tumours were surgically excised from patients and rapidly 
transported to the pathology laboratory, where samples for microarray 
analysis were quickly frozen in liquid nitrogen and stored at -^0 «C untU 
use A frozen tumour specimen was removed from the freezer, cut into 
small pieces (-50-100 mg each), immediately placed into 10-12 ml of Tri- 
zol reagent (Cibco-BRL) and homogenized using a PowerCen 125 Tissue 
Homogenuer (Fisher Scientific), starting at 5,000 r.p.m. and gradually 
increasing to -20,000 r.p.m. over a period of 30-60 s. We processed the Tri- 
zol/tumour homogenate as described in the Trizol protocol, including an 
initial step to remove fat. Once total RNA was obtained, we isolated mRNA 
with a FastTrack 2.0 kit (Invitrogen) using the manufacturer's protocol for 
isolating mRNA starting from total RNA. The normal breast samples were 
obtained from Clontech. 



E 
o 
o 

6 



C 

oS 

V 

c 

© 



r 
w 

E 
< 

C 

3 



© 
O 

o 




-CCL *Ai COLLAGEN TW| IV ALPHA J 
KJAFLNttTONEaAl 
MAPT E-UAP-113 

ELF3 EPITHELIAL TRAWCFUPTKJN FACTOR CM -It 
STHM SlALYlTRANSFfRAK " 

p* alpha- vANTrromm 

PtAKOClOMt 
/MXl UUCP* 1 

*r**Ji.*rHjRtm* hydrolase 

MaS'^^ ««tttCIPTOfiiPN 

PLW PM0SPMOL1MMAM4M 

lCfl3 TRANSFOftMNG GROWTH FACT OA SETA ] 

DVKIMOPlMW 
O&PDEEMOPLAKPt 
DSP OESMOPiAKM 
Majl GA711.J ap-CAM) 

^^^CHANNElI-TYPE ALPHA., 
UGL2 HUMAN CIAM1 LARVAE HOMXOC 

VD*1 BRAM4J PRESSED NHCPA7I homoloc 

VOJP1 »RAW{) PRtSIEO HHCPA7* "^^-^ 
0tC3 NlUROiHOOCW»«.0LC 
■MS1I MSh HOutoeCW HOMCXOG 



cluster 



leukocyte clusters 



COW ANTIGEN 

cm ™n^JT^ WWKmT 

CHS1 CHEOUK.MJGA SMI SYNDROME 1 

, ■ ' " i^StlES? MWTCeHASt^CTIVATPIC P. 
K>SAC<W ADAPT CP PROTEM 
MpLSl H1MATOPOCTICIPCAGC SPECIFIC 

hgbj MitcRm hi a a a fa. 11 

SZfr??^?' 1 *** SY>©RQME 
--^ P f^£«*' , »»OT|»lKPlASeC«TA 1 

PTPRC TYROSPC PHOSPHATASE RECEPTOR C 
HCAJ JNTEGRP.ALPHAI <LF A1) 
JOGAP2 CTPASt ACTfVMUG PROTE* 

CO)* AMTKSEN 

MXJ MTXOVTRU3 RESISTANCE J 
- ... WLPlC«UCTKPLIOAAC 

ptcs *Ho$pHtxp>Asc.cePtt.oN 

SPPI OSTtOPONTM 

iw mamunogl oeu. m lamboa i cm cham 

Kit M*PNlGLCeuim LAMBDA L KMT ruu 
M> 1 «Y*OV**US RESISTANCE 1 lKim CHWN 



1^1 




ORClL ORKMN RECOGNITION COMPLEX 1 
RAC1 KCOMBNASE ACTrVATPC PROTEM 1 

MSH3 mu0 HOMOLOG 3 

^CHA PTOl tFtFUTMC ail NUCLEAR AMTlCCN 
**OPU NUCLf OLAR PROTEM 

MCM7 «P«CMROMOJOM» MAMTlMAMCf 1 (COO Tl 

, COCS AAfTIGtN 

TOP1A DNA TOPOOOMCMASE ■ 
COM CrCLW-OEPENOEWI PROTEM KMASt 4 
CCNACYCIMA 
MA0211HAOILKE.1 
CKS3 COC2S PROTEMkmaSE J 
TUBBMlATuauLM 
TIM HTA TUBULM 
PlMA5F«OTlAiOA« (UMM1 AtPHA 5 
— COC2SB CtLl DnflSOw CVCU 2U 



C034L2 CDM AMTCCN 

EFEMP 1 tXTRACUl ULAN PR01 (M » M 

EFEM>1UTFWCUi.ULAAPR01tM S14 

TPW1 TFtOPOMrOSM ALPHA 

TPM1 TROPOWTOSPt ALPHA 

CBP J COLLAGEN tNMC PROTEPJ » 



CTU CATHEPSML 

plow inn hyproxtlase 3 

1*M TIGHT JUNCTION PROT EM 1 
" SNK SCRLPAMOUOBlI kma&E 
PRCP PROL TLCARBO WP( p i DASC 
MC] HTTOCf N POUOBLE CCME 
nUlMTI CRM BETA 1 
SCTAS SMAU POUOBLC CVTOKME A9 
AACUAMNEAMU 

FCF1 F«R08lA5T GROWTH FACTOR 1 
Lt PnERUEUhM • 
CAV1 CAVEOLM1 

. EPHA3 TTR05ME PROTEM A MAM RECEPTOR 

ACYl AANHOACVLASt 1 
CALOl CALOE1MON I 
CALOl CALOESMON 1 
CALOl CALOESMON 1 

SCK KRUMMBLUCOCORT COO REGULATED kmaSE 
>CM5ERUWCLUCOCORTCODRECIA>TEDR»4ASE 
CTCF CONNECTIVE TISSUE GROWTH FACTOR 
LO« ITS TV OJUDASE 
CEMCTP«PONCPROTEM 
PBN1 FSULK 1 

COV4A1 COLLAGEN TVPt IV ALPHA 1 

COl«A1 COLLAGEN TYPE IV ALPHA 1 

OA0J DfSABLEO HOMOtC 2 

EMPl EPITHELIAL MEMBRANE PROTEP* J 

CAPtCCALPAPO 

THBS1 THR0M8CSPONDM l 

AMR AflTl HTOROCARBON RECEPTOR 



MCSF COLON>.$TlMULATP*G FACTOR 1 

CMTl CHHANASE 1 

|F|» PnERFEROM-PCXJCEO W KO 

GJ»IGAPAiNCTK>«ALPHA 1 

AL CAM ACTIVATED I EUCOCTH CEll AO* SAX MOLECULE 
ALOHt ALDEHYDE DEHTOR OGENASE 4 



atci Gamma ACTP4 

COSANTICtN 

POGFRA ALPHA PL ATELE 1 -OERlVtC GROWTH FACT OR RECEPTOR 
■ t *CAMt PlTERCELlULAR ADHESION MOLECULE 1 

•~ ■ .. •■ FencNEciMiitwcmm} 

PLAUB PLASMMOGEN ACTIVATOR UROMNASE RECEPTOR 
AHR ARTl KYDROCARftON RECEPTOR 
FBLN1 F«ULM 

DPTSiSOiHYOROPYRiMWNASI REIATEDPROTEMO 

RAfta RETPOC ACCRECEPTOR BETA 

SPOCK SPARCJOSTEONECTm 

COL &A4 COLLAGEN TYPE V ALPHA 2 

NF 1 H FACTOR COMPLEMENT IKt 1 

TMT-1 ANTIGEN 

ERDA3CAC REPEAT DOMAIN 

THY I ANTIGEN 

UGT2B10 UDP-GlTCOSYlTRANSF IRASC 2B10 

OUSP1 OUM BPECIFCITY PHOSPHATASE I IS »pkCM>»] 

TAGIM TRANSGELMIMQ7) 

C*YSA3C*HYDROPYFUMWNA« RELATED PRO TC wt-3 
•.1R1MTERLEUKM.1 RECEPTOR. TYPE I 
COlJ*I COLLAGEN TYPE } ALPHA \ 
LUMLUMKAM 
— • TGM3TRANSGLI/TAMPIASE 3 



stromal cell cluster 



proliferation cluster 



Fig. 5 Histologic features of breast cancer biopsies can be rccooni^rt » n H o=w«-h k,.-- * 

diag„ m in F^4 showing ge n, «l US .e,s enriched for I^T" 0 " Pant^n, • tn *" tmt ** 01 th « «>< the Cu,,,, 

cuHured cell Mnes. A clu«e. including many gen« tom^to^S^^J*^ h n 1 spet,meni - » «»'i"9u»hed by clustering with «h, 
the oestrogen receptor .nd tumours, t. Gen« Te.pressed Z l ^ris * JESS ^ rn!, h P , " " " '° and MCF7) de,i,e<i " om ""«' P«'«i« «or 
specimenTUression of these gene, in the tumou^pleT ^ ^^t^^!T'^ "'T' < hi "' at ' i » ia <Hs578T »"<> .nd tumou, 

cyte-derived cell line, showing common leukocyte, and .e^Ce^' a^Zj J T ^ i'r " k"' ' P " imen c GtnM " prMM!d in teuk «>- 
compared with. he tumour specimen, and normal breast The h ahe, elo tsfion o.lh ' Tl \ G !T" ' h " "^'^ ei "" e »' d in •» «« «"« 
higher proliferative,.,, o, ceH, cu«ur.d in the P-™ 0,^^^ 



nature genetics • volume 24 • march 2000 



233 



article 



#©2000 Nature America Inc. • http://genetics.nature.com 



Wc combined mRNA from the following cells in equal quantities to 
make the reference pool: HL-60 (acute myeloid leukaemia) and K562 
(chronic myeloid leukaemia); NC1-H226 (non-small-cell-lung); COLO 
205 (colon); SNB-19 (central nervous system); LOX-IMV] (melanoma); 
OVCAR-3 and OVCAR-4 (ovarian); CAKI-1 (renal); PC-3 (prostate); and 
MCF7 and Hs578T (breast). The criterion for selection of the cell lines in 
the reference are described in detail in the accompanying manuscript 12 . 

Doubling-time cakulations. We calculated doubling times based on rou- 
tine NC160 cell line compound screening data; and they reflect the dou- 
bling times for ceils inoculated into 96-wel] plates at the screening inocula- 
tion densities and grown in RPMI 1640 medium supplemented with 5% 
fetal bovine serum for 48 h. We measured cell populations using sulforho- 
damine B optical density measurement assay. The doubling time constant k 
was calculated using the equation: N/No = e kl , where No is optical density 
for control (untreated) cells at time *ero, N is optical density for control cells 
after 48-h incubation, and t is 48 h. The same equation was then used with the 
derived k to calculate the doubling time t by setting N/No = 2. For a given cell 
line, we obtained No and N values by averaging optical densities (N>6,000) 
obtained for each cell line for a year's screening. Data and experimental details 
are available (http://dtp.nci. nih.gov). 



at http://rana.sianford.edu/sofrware). Each spot was denned by manual 
positioning of a grid of circles over the array image. For each fluorescent 
image, the average pixel intensity within each circle was determined, and a 
local background was computed for each spot equal to the median pixel 
intensity in a square of 40 pixels in width and height centred on the spot 
centre, excluding all pixels within any defined spots. Net signal was deter- 
mined by subtraction of this local background from the average intensity 
for each spot. Spots deemed unsuitable for accurate quantitation because 
of array artefacts were manually flagged and excluded from further analy- 
sis. Data files generated by ScanAlyze were entered into a custom database 
that maintains web- accessible files. Signal intensities between the two fluo- 
rescent images were normalized by applying a uniform scale factor to all 
intens.ties measured for the Cy5 channel. The normalization factor was 
chosen so that the mean log(Cy3/Cy5) for a subset of spots that achieved a 
minimum quality parameter (approximately 6,000 spots) was 0. This effec- 
tively defined the signal-intensity-weighted average' spot on each array to 
have a Cy3/Cy5 ratio of 1 .0. 



Preparation and hybridization of fluorescent labelled cDNA. For each 
comparative array hybridization, labelled cDNA was synthesized by reverse 
transcription from test cell mRNA in the presence of Cy5-dUTP, and from 
the reference mRNA with Cy3-dUTP, using the Superscript 11 reverse-tran- 
scription kit (Gibco-BRL). For each reverse transcription reaction, mRNA 
(2 ug) was mixed with an anchored oligo-dT (d-20T-d(AGC)) primer (4 
ug) in a total volume of 1 5 ul, heated to 70 °C for 1 0 min and cooled on ice. 
To this sample, we added an unlabelled nucleotide pool (0.6 pi; 25 mM 
each dATP, dCTP, dGTP, and 15 mM dTTP), either Cy3 or Cy5 conjugated 
dUTP (3 ul; 1 mM; Amersham), Sxfirst-strand buffer (6 pi; 250 mM Tris- 
HCL, pH 8.3, 375 mM KCI, 15 mM MgCl 3 ), 0.1 M DTT (3 pi) and 2 Ml of 
Superscript H reverse transcriptase (200 u/ul). After a 2-h incubation at 42 
°C. the RNA was degraded by adding 1 N NaOH (1.5 pi) and incubating at 
70 °C for 10 min. The mixture was neutralized by adding of 1 N HCL ( 1.5 
pi), and the volume brought to 500 ul with TE ( 10 mM Tris, ] mM EDTA). 
We added Cot! human DNA (20 ug; Gibco-BRL), and purified the probe 
by centrifugation in a Centricon-30 micro-concentrator (Amicon). The 
two separate probes were combined, brought to a volume of 500 ul, and 
concentrated again to a volume of less than 7 ul. We added 10 ug/ul 
poly(A) RNA (1 pi; Sigma) and tRNA (10 Mg/pl; Gibco-BRL^ were added, 
and adjusted the volume to 9.5 pi with distilled water. For final probe 
preparation. 20xSSC (2.1 ul; 1.5 M Nad, 150 mM NaCitrate, pH 8.0) and 
10% SDS (0.35 ul) were added to a total final volume of 12 pi. The probes 
were denatured by heating for 2 min at 100 °C incubated at 37 °C for 
20-30 min, and placed on the array under a 22 mmx22 mm glass coverslip. 
We incubated slides overnight at 65 °C for 14-18 h in a custom slide cham- 
ber with humidify maintained by a small reservoir of 3xSSC Arrays were 
washed by submersion and agitation for 2-5 min in 2xSSC with 0.1% SDS, 
followed by IxSSC and then O.lxSSC. The arrays were "spun dry" by cen- 
trifugation for 2 min in a slide-rack in a Beckman GS-6 tabletop centrifuge 
in Microplus carriers at 650 r.p.m. for 2 min. 

Array quantitation and data processing. Following hybridization, arrays 
were scanned using a laser- scanning microscope (ref. 17; http://cmgm. 
stanford.edu/pbrown). Separate images were acquired for Cy3 and Cy5. We 
carried out data reduction with the program ScanAlyze (M.B.E., available 



Cluster analysis. We extracted tables (rows of genes, columns of individual 
microarray hybridizations) of normalized fluorescence ratios from the data- 
base. Various selection criteria, discussed in relation to each data set. were 
applied to select subsets of genes from the 9,703 cDNA elements on the 
arrays. Before clustering and display, the logarithm of the measured fluores- 
cence ratios for each gene were centred by subtracting the arithmetic mean of 
all ratios measured for that gene. The centring makes ail subsequent analyses 
independent of the amount of each gene's mRNA in the reference pool. 

We applied a hierarchical clustering algorithm separately to the cell lines 
and genes using the Pearson correlation coefficient as the measure of simi- 
larity and average linkage clustering 3 - 1 *- 21 . The results of this process are 
two dendrograms (trees), one for the cell lines and one for the genes, in 
which very simUar elements are connected by short branches, and longer 
branches join elements with diminishing degrees of similarity. For visual 
display the rows and columns in the initial data table were reordered to 
conform to the structures of the dendrograms obtained from the cluster 
analysis. Each cell in the cluster-ordered data table was replaced by a graded 
colour (pure red through black to pure green), representing the mean- 
adjusted ratio value in the cell. Gene labels in cluster diagrams are dis- 
played here only for genes that were represented in the microarray by 
sequence-verified cDNAs. A complete software implementation of this 
process is available (http://rana.stanford.edu/software), as well as all clus- 
tering results (http://genome-www.stanford.edu/nci60). 
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I, VISHWANATH R. IYER, Ph.D., declare and state as 

follows : 

1. I am an Assistant Professor in the Section of 
Molecular Genetics and Microbiology, Institute of Cellular and 
Molecular Biology, University of Texas at Austin, where my 
laboratory currently studies global transcriptional control in 
yeast, gene expression programs during human cell 
proliferation, and genome -wide transcription factor targets in 
yeast and human. Immediately prior to this position, I spent 
four years as a postdoctoral fellow in the laboratory of 
Patrick 0. Brown at Stanford University studying the 
transcriptional programs of yeast and of human cells. My 
curriculum vitae is attached hereto as Exhibit A. 

2. Beginning in Dr. Brown's laboratory, where I 
helped to develop the first whole genome arrays for yeast and 
early versions of highly representative cDNA arrays for human 
cells, and continuing to the present day, I have used 
microarray-based gene expression analysis as a principal 
approach in much of my research. 

3. Representative publications describing this 
work include: 



^eRisi"J-; et'al., "Exploring the metabolic and 
genetic control of gene expression on a genomic 
scale," Science 278:680-686 (1997) ; J 

Marton et al . , "Drug target validation and 
identification of secondary drug target effects 
using DNA microarrays, - Nature Med. 4:1293-1301 
(1998) ; 2 ^ J 1JU1 

Iyer et al., "The transcriptional program in 
the response of human fibroblasts to serum - 
Science 283:83-87 (1999) ; 3 and 

Ross et al., "Systematic variation in gene 
expression patterns in human cancer cell lines - 
Na ture Genetics 24: 227-235 (2000). 4 

Two of the papers describe our use of microarray-based 
expression profiling to explore the metabolic reprogramming 
that occurs during major environmental changes, both in yeast 
(DeRisi et al . , during the shift from fermentation to 
respiration) and in human cells (Iyer et'al., human 
fibroblasts exposed to serum) . One reference describes our 
use of expression profile analysis in drug target validation 
and identification of secondary drug effects (Marton et al.). 
And one describes our use of expression profiling as a 
molecular phenotyping tool to discriminate among human cancer 
cells (Ross et al . ) . 

4. Whether used to elucidate basic physiological 
responses, to study primary and secondary drug effects, or to 
discriminate and classify human cancers, expression profiling 
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For example, we have demonstrated that we can 
se the presence or absence of a characteristic drug 
-signature- pattern of altered gene expression in drug-treated 
cells to explore the mechanism of drug action, and to identify 
secondary effects that can signal potentially deleterious drug 
side effects. As another example, we have demonstrated that 
gene expression patterns can be used to classify human tumor 
cell lines. While it is of course advantageous to know the 
biological function of the encoded gene products in order to 
reach a better understanding of the cellular mechanisms 
underlying these results, these pattern-based analyses do not 
require knowledge of the biological function of the encoded 
proteins . 

6- The resolution of the patterns used in such 
comparisons is determined by the number of genes detected: the 
greater the number of genes detected, the higher the 
resolution of the pattern. it goes without saying that higher 
resolution, patterns axe generally more useful in such 
comparisons than lower resolution patterns. with such higher 
resolutions comes a correspondingly higher degree of 
statistical confidence for distinguishing different patterns, 
as well as identifying similar ones. 

7. Each gene included as a probe on a microarray 

provides a signal that is sDerifir *-v,« 

as specitic to the cognate transcript, 

at least to a first apcroximafinn 5 p=~k 

o^ioximaaon. Each new gene-specific 



In a more nuanced view it- i c i , 

Ch e presence o f . £i£ » ^"H r ^l' t Tl^" 

(Continued...) 



•3- 



probe added to a microarray thus increases the number of genes 
detectable by the device, increasing the resolving power of 
the device. As I note above, higher resolution patterns are 
generally more useful in comparisons than lower resolution 
patterns. Accordingly, each new gene probe added to a 
microarray increases the usefulness of the device in gene 
expression profiling analyses. This proposition is so well- 
established as to be virtually an axiom in the art, and has 
been as long as I have been working in the field, and 
certainly since the time I embarked on the production of whole 
genome arrays in early 1996. Simply put, arrays with fewer 
gene-specific probes are inferior to arrays with more gene- 
specific probes. 

8. For example, our ability to subdivide cancers 
into discriminable classes by expression profiling is limited 
by the resolution of the patterns produced. With more genes 
contributing to the expression patterns, we can potentially 
draw finer distinctions among the patterns, thus subdividing 
otherwise indistinguishable cancers into a greater number of 
classes; the greater the number of classes, the greater the 
likelihood that the cancers classified together will respond 
similarly to therapeutic intervention, permitting better 
individualization of therapy and, we hope, better treatment 
outcomes . 

9 . If a gene does not change expression in an 
experiment, or if a gene is not expressed and produces no 
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without discriminating among them, and for a probe to signal the presence 
of a variety of allelic variants of a single gene, again without 
discriminating among them. 



signal in an experiment, that is not to say that the probe 
lacks usefulness on the array; it only means that an 
insufficient number of conditions have been sampled to 
identify expression changes, m fact, an experiment showing 
that a gene is not expressed or that its expression level does 
not change can be equally informative. To provide maximum 
versatility as a research tool, the microarray should 
include - and as a biologist I would want my microarray to 
include - each newly identified gene as a probe. 

10. I declare further that all statements made 
herein of my own knowledge are true and that all statements 
made on information and belief are believed to be true, and 
further that these statements were made with the knowledge 
that willful false statements and the like so made are 
punishable by fine or imprisonment, or both, under 
Section 1001 of Title 18 of the United States Code and may 
jeopardize the validity of any patent application in which 
this declaration is filed or any patent that issues thereon. 
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Exploring the Metabolic and Genetic Control of 
Gene Expression on a Genomic Scale 

Joseph L DeRisi, Vishwanath R. Iyer, Patrick O. Brown* 

DNA microarrays containing virtually every gene of Saccharomyces cerevisiae were used 
to carry out a comprehensive investigation of the temporal program of gene expression 
accompanying the metabolic shift from fermentation to respiration. The expression 
profiles observed for genes with known metabolic functions pointed to features of the 
metabolic reprogramming that occur during the diauxic shift, and the expression patterns 
of many previously uncharacterized genes provided clues to their possible functions. The 
same DNA microarrays were also used to identify genes whose expression was affected 
by d letion of the transcriptional co-repressor TUP1 or overexpression of the transcrip- 
tional activator YAP1. These results demonstrate the feasibility and utility of this ap- 
proach to genomewide exploration of gene expression patterns. 



The complete sequences of nearly a do2en 
microbial genomes are known, and in the 
next several years we expect to know the 
complete genome sequences of several 
metazoans, including the human genome. 
Defining the role of each gene in these 
genomes will be a formidable task, and un- 
derstanding how the genome functions as a 
whole in the complex natural history of a 
living organism presents an even greater 
challenge. 

Knowing when and where a gene is 
expressed often provides a strong clue as to 
its biological role. Conversely, the pattern 
of genes expressed in a cell can provide 
detailed information about its state. Al- 
though regulation of protein abundance in 
a cell is by no means accomplished solely 
by regulation of mRNA, virtually all dif- 
ferences in cell type or state are correlated 
with changes in the mRNA levels of many 
genes. This is fortuitous because the only 
specific reagent required to measure the 
abundance of the mRNA for a specific 
gene is a cDNA sequence. DNA microar- 
rays, consisting of thousands of individual 
gene sequences printed in a high-density 
array on a glass microscope slide (J, 2), 
provide a practical and economical tool 
for studying gene expression on a very 
large scale (3-6). 

Saccharomyces cerevisiae is an especially 
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favorable organism in which to conduct a 
systematic investigation of gene expression. 
The genes are easy to recognize in the ge- 
nome sequence, cis regulatory elements are 
generally compact and close to the tran- 
scription units, much is already known 
about its genetic regulatory mechanisms, 
and a powerful set of tools is available for its 
analysis. 

A recurring cycle in the natural history 
of yeast involves a shift from anaerobic 
(fermentation) to aerobic (respiration) me- 
tabolism. Inoculation of yeast into a medi- 
um rich in sugar is followed by rapid growth 
fueled by fermentation, with the production 
of ethanol. When the fermentable sugar is 
exhausted, the yeast cells tum to ethanol as 
a carbon source for aerobic growth. This 
switch from anaerobic growth to aerobic 
respiration upon depletion of glucose, re- 
ferred to as the diauxic shift, is correlated 
with widespread changes in the expression 
of genes involved in fundamental cellular 
processes such as carbon metabolism, pro- 
tein synthesis, and carbohydrate storage 
(7). We used DNA microarrays to charac- 
terize the changes in gene expression that 
take place during this process for nearly the 
entire genome, and to investigate the ge- 
netic circuitry that regulates and executes 
this program. 

Yeast open reading frames (ORFs) were 
amplified by the polymerase chain reaction 
(PCR), with a commercially available set of 
primer pairs (8). DNA microanays, con- 
taining approximately 6400 distinct DNA 
sequences, were printed onto glass slides by 



using a simple robotic printing device (9). 
Cells from an exponentially growing culture 
of yeast were inoculated into fresh medium 
and grown at 30°C for 21 hours. After an 
initial 9 hours of growth, samples were har- 
vested at seven successive 2-hour intervals, 
and mRNA was isolated (10). Fluorescently 
labeled cDN A was prepared by reverse tran- 
scription in the presence of Cy3(green)- 
or Cy5(red)-labeled deoxyuridine triphos- 
phate (dUTP) (11) and then hybridized to 
the microarrays (12). To maximize the re- 
liability with which changes in expression 
levels could be discerned, we labeled cDNA 
prepared from cells at each successive time 
point with Cy5, then mixed it with a Cy3- 
labeled "reference" cDNA sample prepared 
from cells harvested at the first interval 
after inoculation. In this experimental de- 
sign, the relative fluorescence intensity 
measured for the Cy3 and Cy5 fluors at 
each array element provides a reliable mea- 
sure of the relative abundance of the corre- 
sponding mRNA in the two cell popula- 
tions (Fig. 1). Data from the series of seven 
samples (Fig. 2), consisting of more than 
43,000 expression-ratio measurements, 
were organized into a database to facilitate 
efficient exploration and analysis of the 
results. This database is publicly available 
on the Internet (13). 

During exponential growth in glucose- 
rich medium, the global pattern of gene 
expression was remarkably stable. Indeed, 
when gene expression patterns between the 
first two cell samples (harvested at a 2-hour 
interval) were compared, mRNA levels dif- 
fered by a factor of 2 or more for only 19 
genes (0.3%), and the largest of these dif- 
ferences was only 2.7-fold ( H). However, as 
glucose was progressively depleted from the 
growth media during the course of the ex- 
periment, a marked change was seen in the 
global pattern of gene expression. mRNA 
levels for approximately 710 genes were 
induced by a factor of at least 2, and the 
mRNA levels for approximately 1030 genes 
declined by a factor of at least 2. Messenger 
RNA levels for 183 genes increased by a 
factor of at least 4, and mRNA levels for 
203 genes diminished by a factor of at least 
4* About half of these differentially ex- 
pressed genes have no currently recognized 
function and are not yet named. Indeed, 
more than 400 of the differentially ex- 
pressed genes have no apparent homology 
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to any gene wKose function iT known (15f. 
The responses of these previously unchar- 
acterized genes to the diauxic shift therefore 
provides the first small clue to their possible 
roles. 

The global view of changes in expres- 
sion, of genes with known functions pro- 
vides a vivid picture of the way in which 
the cell adapts to a changing environ- 
ment. Figure 3 shows a portion of the yeast 
metabolic pathways involved in carbon 
and energy metabolism. Mapping the 
changes we observed in the mRNAs en- 
coding each enzyme onto this framework 
allowed us to infer the redirection in the 
flow of metabolites through this system. 
We observed large inductions of the genes 
coding for the enzymes aldehyde dehydro- 
genase (ALD2) and acetyl-coenzyme 
A(CoA) synthase (ACSJ), which func- 
tion together to convert the products of 
alcohol dehydrogenase into acetyl-CoA, 
which in turn is used to fuel the tricarbox- 
ylic acid (TCA) cycle and the glyoxylate 
cycle. The concomitant shutdown of tran- 
scription of the genes encoding pyruvate 
decarboxylase and induction of pyruvate 
carboxylase rechannels pyruvate away 
from acetaldehyde, and instead to oxalac- 
etate, where it can serve to supply the 
TCA cycle and gluconeogenesis. Induc- 
tion of the pivotal genes PCKl, encoding 
phosphoenolpyruvate carboxykinase, and 
FBP1, encoding fructose 1,6-biphos- 
phatase, switches the directions of two key 
irreversible steps in glycolysis, reversing 
the flow of metabolites along the revers- 
ible steps of the glycolytic pathway toward 
the essential biosynthetic precursor, glu- 
coses-phosphate. Induction of the genes 
coding for the trehalose synthase and gly- 
cogen synthase complexes promotes chan- 
neling of glucose-6-phosphate into these 
carbohydrate storage pathways. 

Just as the changes in expression of 
genes encoding pivotal enzymes can pro- 
vide insight into metabolic reprogram- 
ming, the behavior of large groups of func- 
tionally related genes can provide a broad 
view of the systematic way in which the 
yeast cell adapts to a changing environ- 
ment (Fig. 4). Several classes of genes, 
such as cytochrome c-related genes and 
those involved in the TCA/glyoxylate cy- 
cle and carbohydrate storage, were coordi- 
nate^ induced by glucose exhaustion. In 
contrast, genes devoted to protein synthe- 
sis, including ribosomal proteins, tRNA 
synthetases, and translation, elongation, 
and initiation factors, exhibited a coordi- 
nated decrease in expression. More than 
95% of ribosomal genes showed at least 
twofold decreases in expression during the 
diauxic shift (Fig. 4) (13). A noteworthy 
and illuminating exception was that the 



genes encoding mitochondrial ribosomal 
genes were generally induced rather than 
repressed after glucose limitation, high- 
lighting the requirement for mitchondrial 
biogenesis (13). As more is learned about 
the functions of every gene in the yeast 
genome, the ability to gain insight into a 
cell's response to a changing environment 
through its global gene expression patterns 
will become increasingly powerful. 

Several distinct temporal patterns of ex- 
pression could be recognized, and sees of 
genes could be grouped on the basis of the 
similarities in their expression patterns. The 
characterized members of each of these 
groups also shared important similarities in 
their functions. Moreover, in most cases, 
common regulatory mechanisms could be 
inferred for sets of genes with similar expres- 
sion profiles. For example, seven genes 
showed a late induction profile, with mRNA 
levels increasing by more than ninefold at 



the lasuimepoint but less than three/ Id at 
the preceding timepoint (Fig. 5B). All of 
these genes were known t be glucose-re- 
pressed, and five of the seven were previously 
noted to share a common upstream activat- 
ing sequence (UAS), the carbon source re- 
sponse element (CSRE) (16-20). A search 
in the promoter regions of the remaining two 
genes, ACR1 and. lDP2 t revealed that 
ACR1, a gene essential for ACS1 activity, 
also possessed a consensus CSRE motif, but 
interestingly, JDP2 did not. A search of the 
entire yeast genome sequence for the con- 
sensus CSRE motif revealed only four addi- 
tional candidate genes, none of which 
showed a similar induction. 

Examples from additional groups of 
genes that shared expression profiles are 
illustrated in Fig. 5, C through F. The 
sequences upstream of the named genes in 
Fig. 5C all contain stress response ele- 
ments (STRE), and with the exception 




Fig. 1. Yeast genome microarray. The actual size of the microarray is 18 mm by 16 mm The 
microarray was printed as described (9). This image was obtained with the same fluorescent 
In^ 9 ™ nioca} mi coscope used to collect alt the data we report {49). A fluorescently labeled 
cDNA probe was prepared from mRNA isolated from cells harvested shortly after inoculation (culture 
densrty of <5 x 10 6 cells/ml and media glucose level of 19 g/liter) by reverse transcription in the 
presence of Cy3-dUTP. Similarly, a second probe was prepared from mRNA isolated from cells taken 
LT^- Same Cu,ture 9 5 hours later (cunure densit y of -2 x io e cells/ml, with a glucose level of 
o o i™ r by r6VerSe transcn 'P tion in th e presence of CyS-dUTP. In this image, hybridization of the 
Cy3-dUTP-labeled cDNA (that is, mRNA expression at the initial timepoint) is represented as a oreen 
s.gnal, and hybridization of Cy5-dUTP-labeled cDNA (that is, mRNA expression at 9 5 hours) is 
represented as a red signal. Thus, genes induced or repressed after the diauxic shift appear in this 
image as red and green spots, respectively. Genes expressed at roughly equal levels before and after 
the diauxic shift appear in this image as yellow spots. 
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of HSP42, have previously been shown to 
be controlled at least in pan by these 
elements (2J-24). Inspection of the se- 
quences upstream of HSP42 and the two 
uncharacterized genes shown in Fig. 5C, 
YKL026c ( a hypothetical protein with 
similarity to glutathione peroxidase, and 
YGR043c, a putative transaldolase, re- 
vealed that each of these genes also pos- 
sess repeated upstream copies of the stress- 
responsive CCCCT motif. Of the 13 ad- 
ditional genes in the yeast genome that 
shared this expression profile [including 
HSP30, ALD2, OM45, and 10 uncharac- 
terized ORFs (25)], nine contained one or 
more recognizable STRE sites in their up- 
stream regions. 

The heterotrimeric transcriptional acti- 
vator complex HAP2 f 3,4 has been shown 
to be responsible for induction of several 
genes important for respiration (26-28). 
This complex binds a degenerate consensus 
sequence known as the CCAAT box (26). 
Computer analysis, using the consensus se- 
quence TNRYTGGB (29), has suggested 
that a large number of genes involved in 
respiration may be specific targets of 
HAP2,3,4 (30). Indeed, a putative 
HAP2 t 3,4 binding site could be found in 
the sequences upstream of each of the seven 
cytochrome c-related genes that showed 
the greatest magnitude of induction (Fig, 
5D). Of 12 additional cytochrome c-related 
genes that were induced, HAP2,3,4 binding 
sites were present in all but one. Signifi- 
cantly, we found that transcription of 
HAP4 itself was induced nearly ninefold 
concomitant with the diauxic shift. 

Control of ribosomal protein biogenesis 
is mainly exerted at the transcriptional 
level, through the presence of a common 
upstream-activating element (UAS ) 
that is recognized by the Rapl DNA-bind- 
ing protein (3J, 32). The expression pro- 
files of seven ribosomal proteins are shown 
in Fig. 5F. A search of the sequences 
upstream of all seven genes revealed con- 
sensus Rapl -binding motifs (33). It has 
been suggested that declining Rapl levels 
in the cell during starvation may be re- 
sponsible for the decline in ribosomal pro- 
tein gene expression (34). Indeed, we ob- 
served that the abundance of RAP] 
mRNA diminished by 4-4-fold, at about 
the time of glucose exhaustion. 

Of the 149 genes that encode known or 
putative transcription factors, only two, 
HAP4 and SJP4, were induced by a factor of 
more than threefold at the diauxic shift. 
SIP4 encodes a DNA-binding transcrip- 
tional activator that has been shown to 
interact with Snfl, the "master regulator" of 
glucose repression (35). The eightfold in- 
duction of SJP4 upon depletion of glucose 
strongly suggests a role in the induction of 



downstream genes at the diauxic shift. 

Although most of the transcriptional 
responses that we observed were not pre- 
viously known, the responses of many 
genes during the diauxic shift have been 
described. Comparison of the results we 
obtained by DNA microarray hybridiza- 
tion with previously reported results there- 
fore provided a strong test of the sensitiv- 
ity and accuracy of this approach. The 
expression patterns we observed for previ- 
ously characterized genes showed almost 
perfect concordance with previously pub- 
lished results (36). Moreover, the differ- 
ential expression measurements obtained 
by DNA microanay hybridization were re- 
producible in duplicate experiments. For 
example, the remarkable changes in gene 
expression between cells harvested imme- 
diately after inoculation and immediately 
after the diauxic shift (the first and sixth 
intervals in this time series) were mea- 
sured in duplicate, independent DNA mi- 
croanay hybridizations. The correlation 
coefficient for two complete sets of expres- 
sion ratio measurements was 0.87, and for 
more than 95% of the genes, the expres- 



sion ratios—measured in these duplicate 
experiments differed by less than a feet r 
of 2. However, in a . few cases, there were 
discrepancies between our results and pre- 
vious results, pointing to technical limita- 
tions that will need to be addressed as 
DNA microarray technology advances 
(37, 38). Despite the noted exceptions, 
the high concordance between the results 
we obtained in these experiments and 
those of previous studies provides confi- 
dence in the reliability and thoroughness 
of the survey. 

The changes in gene expression during 
this diauxic shift are complex and involve 
integration of many kinds of information 
about the nutritional and metabolic state 
of the cell. The large number of genes 
whose expression is altered and the diver- 
sity of temporal expression profiles ob- 
served in this experiment highlight the 
challenge of understanding the underlying 
regulatory mechanisms. One approach to 
defining the contributions of individual 
regulatory genes to a complex program of 
this kind is to use DNA microarrays to 
identify genes whose expression is affected 



Fig. 2. The section of the ar- 
ray indicated by the gray box 
in Rg. 1 is shown tor each of 
the experiments described 
here. Representative genes 
are labeled. In each of the ar- 
rays used to analyze gene 
expression during the diauxic 
shift, red spots represent 
genes that were induced rel- 
ative to the initial timepoint, 
and green spots represent 
genes that were repressed 
relative to the initial timepoint. 
In the arrays used to analyze 
the effects of the tuplb mu- 
tation and YAPl overexpres- 
son, red spots represent 
genes whose expression was 
increased, and green spots 
represent genes whose ex- 
pression was decreased by 
the genetic modification. Note 
that distinct sets of genes are 
induced and repressed in the 
different experiments . The 
complete images of each of 
these arrays can be viewed on 
the Internet (73). Cell density 
as measured by optical densi- 
ty (OD) at 600 nm was used to 
measure the growth of the 
culture. 



Growth OD 0.14 



Growth OD 0.46 



Growth OD 0.6 
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Ey mutations in each putative regulatory" 
gene. As a test of this strategy, we analyzed 
the genomewide changes in gene expression 
that result from deletion of the TUP] gene. 
Transcriptional repression of many genes by 
glucose requires the DNA-binding repressor 




Migl and is mediated by recruiting the tran- 
scriptional co-repressors Tupl and Cyc8/ 
Ssn6 (39). Tupl has also been implicated in 
repression of oxygen-regulated, mating-type- 
specific, and DNA-damage-inducible genes 
(40). 
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Fig. 3. Metabolic reprogramming inferred from global analysis of changes in gene expression. Only key 
metabolic intermediates are identified. The yeast genes encoding the enzymes that catalyze each step 
in this metabolic circuit are identified by name in the boxes. The genes encoding succinyl-CoA synthase 
and grycogen-debranching enzyme have not been explicitly identified, but the ORFs YGR244 and 
YPR184 show significant homology to known succinyt-CoA synthase and glycogen- debranching en- 
zymes, respectively, and are therefore included in the corresponding steps in this figure. Red boxes with 
white lettering identify genes whose expression increases in the diauxic shift. Green boxes with dark 
green lettering identify genes whose expression diminishes in the diauxic shift. The magnitude of 
induction or repression is indicated for these genes. For multimeric enzyme complexes, such as 
succinate dehydrogenase, the indicated fold-induction represents an unweighted average of all the 
genes listed in the box. Black and white boxes indicate no significant differential expression (less than 
twofold). The direction of the arrows connecting reversible enzymatic steps indicate the direction of the 
flow of metabolic intermediates, inferred from the gene expression pattern, after the diauxic shift . Arrows 
representing steps catalyzed by genes whose expression was strongly induced are highlighted in red. 
The broad gray arrows represent major increases in the flow of metabolites after the diauxic shift, 
inferred from the indicated changes in gene expression. 
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WiloVtype yeast cells and cells bearing 
a deletion of the TUP I gene (tupl A) were 
grown in parallel cultures in rich medium 
containing glucose as the carbon source. 
Messenger RNA was isolated from expo- 
nentially growing cells from the two pop- 
ulations and used to prepare cDNA la- 
beled with Cy3 (green) and Cy5 (red), 
respectively (J J). The labeled probes were 
mixed and simultaneously hybridized to 
the microarray. Red spots on the microar- 
ray therefore represented genes whose 
transcription was induced in the tup J A 
strain, and thus presumably repressed by 
Tupl (4 J). A representative section of the 
microarTay (Fig. 2, bottom middle panel) 
illustrates that the genes whose expression 
was affected by the rupJA mutation, were, 
in general, distinct from those induced 
upon glucose exhaustion [complete images 
of all the arrays shown in Fig. 2 are avail- 
able on the Internet (13)). Nevertheless, 
34 (10%) of the genes that were induced 
by a factor of at least 2 after the diauxic 
shift were similarly induced by deletion of 
TUPl, suggesting that these genes may be 
subject to TUP I -mediated repression by 
glucose. For example, SUC2, the gene en- 
coding invertase, and all five hexose trans- 
porter genes that were induced during the 
course of the diauxic shift were similarly 
induced, in duplicate experiments, by the 
deletion of TUPL 

The set of genes affected by Tupl in this 
experiment also included a-glucosidases, 
the mating-type-specific genes MFAJ and 
MFA2, and the DNA damage-inducible 
RNR2 and RNK4, as well as genes involved 
in flocculation and many genes of unknown 
function. The hybridization signal corre- 
sponding to expression of TUPl itself was 
also severely reduced because of the (in- 
complete) deletion of the transcription unit 
in the tupl A strain, providing a positive 
control in the experiment (42). 

Many of the transcriptional targets of 
Tupl fell into sets of genes with related 
biochemical functions. For instance, al- 
though only about 3% of all yeast genes 
appeared to be TUP /-repressed by a factor 
of more than 2 in duplicate experiments 
under these conditions, 6 of the 13 genes 
that have been implicated in flocculation 
(15) showed a reproducible increase in 
expression of at least twofold when TUP] 
was deleted. Another group of related 
genes that appeared to be subject to TUP] 
repression encodes the serine-rich cell 
wall mannoproteins, such as Tipl and 
Tirl/Srpl which are induced by cold 
shock and other stresses (43), and similar, 
serine-poor proteins, the seripauperins 
(44). Messenger RNA levels for 23 of the 
26 genes in this group were reproducibly 
elevated by at least 2.5-fold in the tupl A 
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strain, and 18 of these genes were induced 
by more than sevenfold when TUPl was 
deleted. In contrast, none of 83 genes that 
could be classified as putative regulators of 
the cell division cycle were induced more 
than twofold by deletion of TUPl. Thus, 
despite the diversity of the regulatory sys- 
tems that employ Tupl, most of the genes 
that it regulates under these conditions 
fall into a limited number of distinct func- 
tional classes. 

Because the microarray allows us to 
monitor expression of nearly every gene in 
yeast, we can, in principle, use this ap- 
proach to identify all the transcriptional 
targets of a regulatory protein like Tupl. It 
is important to note, however, that in any 
single experiment of this kind we can only 
recogni2e those target genes that are nor- 
mally repressed (or induced) under the 
conditions of the experiment. For in- 
stance, the experiment described here an- 
alyzed a MAT a strain in which MFA1 
and MFA2, the genes encoding the a- 
factor mating pheromone precursor, are 
normally repressed. In the isogenic tupl A 
strain, these genes were inappropriately 
expressed, reflecting the role that Tupl 
plays in their repression. Had we instead 
carried out this experiment with a MATA 
strain (in which expression of MFA1 and 
MFA2 is not repressed), it would not have 
been possible to conclude anything re- 
garding the role of Tupl in the repression 
of these genes. Conversely, we cannot dis- 
tinguish indirect effects of the chronic 
absence of Tupl in the mutant strain from 
effects directly attributable to its partici- 
pation in repressing the transcription of a 
gene. 

Another simple route to modulating the 
activity of a regulatory factor is to overex- 
press the gene that encodes it. YAPl en- 
codes a DNA-binding transcription factor 
belonging to the b-zip class of DNA-bind- 
ing proteins. Overexpression of YAPl in 
yeast confers increased resistance to hydro- 
gen peroxide, o-phenanthroline, heavy 
metals, and osmotic stress (45). We ana- 
lyzed differential gene expression between a 
wild-type strain bearing a control plasmid 
and a strain with a plasmid expressing YAPl 
under the control of the strong GAL1-10 
promoter, both grown in galactose (that is, 
a condition that induces YAPl overexpres- 
sion). Complementary DNA from the con- 
trol and YAP J overexpressing strains, la- 
beled with Cy3 and Cy5, respectively, was 
prepared from mRNA isolated from the two 
strains and hybridized to the microanay. 
Thus, red spots on the array represent genes 
that were induced in the strain overexnress- 
ing YAP/. 

Of the 17 genes whose mRNA levels 
increased by more than threefold when 



YAPl was overexpressed in this way, five 
bear homology to aryl-alcohol oxidoreduc- 
tases (Fig. 2 and Table 1). An additional 
four of the genes in this set also belong to 
the general class of dehydrogenases/oxi- 
doreductases. Very little is known about 
the role of aryl-alcohol oxidoreductases in 
S. cerevisiae, but these enzymes have been 
isolated from ligninolytic fungi, in which 
they participate in coupled redox reac- 
tions, oxidizing aromatic, and aliphatic 
unsaturated alcohols to aldehydes with the 
production of hydrogen peroxide (46, 47). 
The fact that a remarkable fraction of the 
targets identified in this experiment be- 
long to the same small, functional group of 
oxidoreductases suggests that these genes 



Fig. 4. Coordinated, reg- 
ulation of functionally re- 
lated genes. The curves 
represent the average in- 
duction or repression ra- 
tios for all the genes in 
each indicated group. 
The total number of 
genes in each group was 
as follows: ribosomal 
proteins, 112; translation 
elongation and initiation 



might ptey an important protective role 
during oxidative stress. Transcription of a 
small number of genes was reduced in the 
strain overexpressing Yapl. Interestingly, 
many of these genes encode sugar per- 
meases or enzymes involved in inositol 
metabolism. 

r-rJ? C r4? r A ched for ^P^inding sites 
(TTACTAA or TGACTAA) in the se- 
quences upstream of the target genes we 
identified (48). About two-thirds of the 

g L nCS r £ 3t Were induced more than 
threefold upon Yapl overexpression had 
one or more binding sites within 600 bases 
upstream of the start codon (Table 1), sug- 
gestingthat they are directly regulated by 
Yapl. The absence of canonical Yapl-bind- 
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Tng siteTupstfeam oT the others may reflect 
an ability of Yapl to bind sites that differ 
from the canonical binding sites, perhaps in 
cooperation with other factors, or less like- 
ly, may represent an indirect effect of Yapl 
overexpression, mediated by one or more 
intermediary factors. Yapl sites were found 
only four times in the corresponding region 
of an arbitrary set of 30 genes that were not 
differentially regulated by Yapl. 

Use of a DNA microarray to character- 
ize the transcriptional consequences of 
mutations affecting the activity of regula- 
tory molecules provides a simple and pow- 
erful approach to dissection and character- 
ization of regulatory pathways and net- 



works. This strategy also has an important 
practical application in drug screening. 
Mutations in specific genes encoding can- 
didate drug targets can serve as surrogates 
for the ideal chemical inhibitor or modu- 
lator of their activity. DNA microarrays 
can be used to define the resulting signa- 
ture pattern of alterations in gene expres- 
sion, and then subsequently used in an 
assay to screen for compounds that repro- 
duce the desired signature pattern. 

DNA microarrays provide a simple and 
economical way to explore gene expres- 
sion patterns on a genomic scale. The 
hurdles to extending this approach to any 
other organism are minor. The equipment 




Time (hours) 

Fig. 5. Distinct temporal patterns of induction or repression help to group genes that share requlatorv 
properties. (A) Temporal profile of the cell density, as measured by OD at 600 nm and qlucose 
concentration m the media. (B) Seven genes exhibited a strong induction (greater than ninefold) only at 
the last timepoint (20.5 hours). With the exception of IDP2, each of these genes has a CSRE UAS There 
were no additional genes observed to match this profile. (C) Seven members of a class of genes marked 
by early induction with a peak in mRNA levels at 18.5 hours. Each of these genes contain STRE motif 
repeats in their upstream promoter regions. (D) Cytochrome c oxidase and ubiquinol cytochrome c 
reductase genes. Marked by an induction coincident with the diauxic shift, each of these genes contains 
a consensus binding motif for the HAP2.3.4 protein complex. At least 17 genes shared a similar 
expression profile. (E) SAM1, GPPh and several genes of unknown function are repressed before the 
diauxic shift, and continue to be repressed upon entry into stationary phase. (F) Ribosomal protein 
genes comprise a large class of genes that are repressed upon depletion of glucose. Each of the genes 
profiled here contains one or more RAP1 -binding motifs upstream of its promoter. RAP1 is a transcrio- 
tional regulator of most ribosomal proteins. 



Reports 



requiredjbr fabricating and using DNA 
microanays (9) consists of components 
that were chosen for their modest cost and 
simplicity. It was feasible for a small group 
to accomplish the amplification of more 
than 6000 genes in about 4 months and, 
once the amplified gene sequences were in 
hand, only 2 days were required to print a 
set of 110 microarrays of 6400 elements 
each. Probe preparation, hybridization, 
and fluorescent imaging are also simple 
procedures. Even conceptually simple ex- 
periments, as we described here, can yield 
vast amounts of information. The value of 
the information from each experiment of 
this kind will progressively increase as 
more is learned about the functions of 
each gene and as additional experiments 
define the global changes in gene expres- 
sion in diverse other natural processes and 
genetic perturbations. Perhaps the greatest 
challenge now is to develop efficient 
methods for organizing, distributing, inter- 
preting, and extracting insights from the 
large volumes of data these experiments 
will provide. 
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We describe here a method for drug target validation and identification of secondary drug tar- 
get effects based on genome-wide gene expression patterns. The method is demonstrated by 
several experiments, including treatment of yeast mutant strains defective in calcineurin, im- 
munophilins or other genes with the immunosuppressants cyclosporin A or FK506. Presence or 
absence of the characteristic drug 'signature' pattern of altered gene expression in drug-treated 
cells with a mutation in the gene encoding a putative target established whether that target was 
required to generate the drug signature. Drug dependent effects were seen in 'targetless* cells, 
showing that FK506 affects additional pathways independent of calcineurin and the im- 
munophilins. The described method permits the direct confirmation of drug targets and recog- 
nition of drug-dependent changes in gene expression that are modulated through pathways 
distinct from the drug's intended target. Such a method may prove useful in improving the effi- 
ciency of drug development programs. 



Good drugs are potent and specific: that is. they must have 
strong effects on a specific biological pathway and minimal ef- 
fects on all other pathways. Confirmation that a compound in- 
hibits the intended target (drug target validation) and the 
identification of undesirable secondary effects are among the 
main challenges in developing new drugs. Comprehensive 
methods that enable researchers to determine which genes or 
activities are affected by a given drug might improve the effi- 
ciency of the drug discovery process by quickly identifying po- 
tential protein targets, or by accelerating the identification of 
compounds likely to be toxic. DNA microarray technology, 
which permits simultaneous measurement of the expression 
levels of thousands of genes, provides a comprehensive frame- 
work to determine how a compound affects cellular metabolism 
and regulation on a genomic scale 1 * 11 . DNA microarrays that 
contain essentially every open reading frame (ORF) in the 
Saccharomyces cere visiae genome have already been used success- 
fully to explore the changes in gene expression that accompany 
large changes in cellular metabolism or cell cycle progression T, °. 

In the modern drug discovery paradigm, which typically be- 
gins with the selection of a single molecular target, the ideal in- 
hibitory drug is one that inhibits a single gene product so 
completely and so specifically that it is as if the gene product 
were absent. Treating cells with such a drug should induce 
changes in gene expression very similar to those resulting from 
deleting the gene encoding the drug's target. Here we have com- 
pared the genome-wide effects on gene expression that result 
from deletions of various genes in the budding yeast 5. cerevisiae 
to the effects on gene expression that result from treatment 



with known inhibitors of those gene products. Using the cal- 
cineurin signaling pathway as a model system, we tested an ap- 
proach that permits identification of genes that encode proteins 
specifically involved in pathways affected by a drug. The FK506 
characteristic pattern, or 'signature*, of altered gene expression 
was not observed in mutant cells lacking proteins inhibited by 
FK506 (for example, a calcineurin or FK506-binding-protein 
mutant strain), but was observed in mutants deleted for genes 
in pathways unrelated to FK506 action (for example, a cy- 
clophilin mutant strain). Conversely, the cyclosporin A (CsA) 
signature was not observed in CsA-treated calcineurin or cy- 
clophilin mutant strains, but was seen in an FK506-binding-pro- 
tein mutant strain treated with CsA. The method also 
demonstrates that FK506. a clinically used immunosuppressant, 
has 'off-target' effects that are independent of its binding to im- 
munophilins. Thus, the approach we describe may provide a 
way to identify the pathways altered by a drug and to detect 
drug effects mediated through unintended targets. 

Null mutants phenocopy drug-treated cells on a genomic scale 
To test whether a null mutation in a drug target serves as a 
model of an ideal inhibitory drug, we examined the effects on 
gene expression associated with pharmacological or genetic in- 
hibition of calcineurin function. Calcineurin is a highly con- 
served calcium- and calmodulin-activated serine/threonine 
protein phosphatase implicated in diverse processes dependent 
on calcium signaling 12 ' 13 . In budding yeast, calcineurin is re- 
quired for intracellular ion homeostasis 14 , for adaptation to pro- 
longed mating pheromone treatment 15 and in the regulation of 
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rig. 1 ' Model of antagonism of the caicineurin signaling pathway mediated 
by FK506 and cyclosporin A (CsA). Caicineurin activity is composed of a cat- 
alytic subunit (caicineurin A encoded in yeast by the CNA 7 and CNA2 genes), 
and calcium-binding regulatory submits calmodulin (CMD) and caicineurin B 
(CnB). After entering cells, FK506 and CsA specifically bind and inhibit the 
peptkJyl-proline isomerase activity of their respective immunophilins, FK506 
binding proteins (FKBP) and cyclophilins (CyP). The most abundant irrv 
munophilins in yeast (Fprl and Cph1) are thought to mediate caicineurin in- 
hibition. Drug-immunophain complexes bind and inhibit the calcium- and 
calmodulin-stimutated phosphatase caicineurin. Among the substrates of cai- 
cineurin are transcriptional activators that act to modulate gene expression. 



the onset of mitosis'*. In mammals, caicineurin has been impli- 
cated in T-cell activation 12 , in apoptosis 17 . in cardiac hypertro- 
phy 11 and in the transition from short-term to long-term 
memory 19 . In both organisms, caicineurin activity is inhibited 
by FK506 and CsA. immunosuppressant drugs whose effects on 
caicineurin are mediated through families of intracellular recep- 
tor proteins called immunophilins 1 " 0 (Fig. 1). To assess the ef- 
fects of pharmacologic inhibition of caicineurin. wild-type 5. 
cerevisiae was grown to early logarithmic phase in the presence 
or absence of FK506 or CsA. Isogenic cells, from which the 
genes encoding the catalytic subunits of caicineurin (CNA 2 and 
CNAZ) had been deleted 21 (referred to as the cna or caicineurin 
mutant), were grown in parallel, in the absence of the drug. 
Fluorescently-labeled cDNA was prepared by reverse transcrip- 
tion of polyA* RNA in the presence of Cy3- or Cy5-deoxynu- 
cleotide triphosphates and then hybridized to a microarray 
containing more than 6.000 DNA probes representing 97% of 
the known or predicted ORFs in the yeast genome. 
Simultaneous hybridization of Cy5-labeled cDNA from mock- 
treated cells and Cy3-labeled cDNA from ceils treated with 1 
ug/ml FK506 allowed the effect of drug treatment on mRNA lev- 
els of each ORF to be determined (Fig. 2a and b and data not 
shown). Similarly, effects of the caicineurin mutations on the 
mRNA levels of each gene were assessed by simultaneous hy- 
bridization of Cy5-labeled cDNA from wild-type cells and Cy3- 
Jabeled cDNA from the caicineurin mutant strain(Fig. 2c). For 
each comparison of this kind, reported expression ratios are the 
average of at least two hybridizations in which the Cy3 and Cy5 
fluors were reversed to remove biases that may be introduced by 
gene-specific differences in incorporation of the two fluors 
(data not shown). 

Treatment with FK506 in these growth conditions resulted in 
a signature pattern of altered gene expression in which mRNA 
levels of 36 ORFs changed by more than twofold 
(http://www.rosetta.org). A very similar pattern of altered gene 
expression was observed when the caicineurin mutant strain 
was compared to wild-type cells. Comparison of the changes in 
mRNA expression of each gene resulting from treatment of 
wild-type cells with FK506 with mRNA expression changes re- 
sulting from deletion of the caicineurin genes showed the con- 
siderable similarity of the global transcript alterations in 
response to the two perturbations (Fig. 2o-d). Quantification of 
this similarity using the correlation coefficient (p) showed 
large correlations between the FK506 treatment signature and 
the caicineurin deletion signature (p = 0.75 ± 0.03). as well as 
the CsA treatment signature (p - 0.94±0.02). but not with a 
randomly selected deletion mutant strain (deleted for the 
YER071C gene; p « -0.07 ± 0.04; Fig. 2e). The FK506 treatment 
signature was also compared with those of more than 40 other 
deletion mutant strains or drug-treatments thought to affect 
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unrelated pathways, and none had statistically significant cor- 
relations. These data establish that genetic disruption of cai- 
cineurin function provides a close and specific phenocopy of 
treatment with FK506 or CsA. 

To avoid generalizing from a single example, we also com- 
pared the effects of treatment of wild-type cells with 3-aminotri- 
azole (3-AT) with the effects of deletion of the H1S3 gene. HIS3 
encodes imidazoleglycerol phosphate dehydratase, which cat- 
alyzes the seventh step of the histidine biosynthetic pathway in 
yeast 22 ; 3-AT is a competitive inhibitor of this enzyme that trig- 
gers a large transcriptional amino-acid starvation response 23 . 
Microarray analysis of wild-type and isogenic /iXs3-deficient 
strains demonstrated the expected large genome-wide transcrip- 
tional responses (involving more than 1,000 ORFs) resulting 
from treatment with 3-AT (Fig. 3a) or from H1S3 deletion (Fig. 
3c). Quantitative comparison of the 3-AT treatment signature 
and the his3 mutant signature showed a high level of correlation 
(p= 0.76 ± 0.02) that even extended to genes that experienced 
small changes in expression level (Fig. 3b). As a negative control, 
the correlations between the 3-AT treatment signature or the 
his3 mutant signature and the caicineurin mutant strain were 
not statistically significant (p = 0.09 ± 0.06 and -0.01 ± 0.04, re- 
spectively). That both the calcineurin/FK506 and the /tfs3/3-AT 
comparisons were highly correlated indicates that in many cases 
the expression profile resulting from a gene deletion closely re- 
sembles the expression profile of wild-type cells treated with an 
inhibitor of that gene's product. 



'Decoder' strategy: Drug target validation with deletion mutants 
Because pharmacological inhibition of different targets might 
give similar or identical expression profiles, simple comparison 
of drug signatures to mutant signatures is unlikely to unambigu- 
ously identify a drug's target. To overcome this limitation, an 
additional ■decoder" step is used. We first compare the expres- 
sion profile of wild-type drug-treated cells to the expression pro- 
files from a panel of genetic mutant strains, using a correlation 
coefficient metric. Mutant strains whose expression profile is 
similar to that of drug-treated wild-type cells are selected and 
subjected to drug treatment, generating the drug signature in 
the mutant strain (that is, the mutant drug signature). If the 
mutated gene encodes a protein involved in a pathway affected 
by the drug, we expect the drug signature in mutant cells to be 
different (or absent, for an ideal drug) from the drug signature 
seen in wild-type cells. 
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Fig. 2 Expression profiles from 
FK506-treated wild-type (wl) 
cells and a calcineurirvdisfuption 
mutant strain share a genome- 
wide correlation. DNA microarray 
analysis showing changes in gene 
expression resulting from FK506 
treatment (a and b) or from ge- 
netic disruption of genes encod- 
ing calcineurin (c). s, Pseudo- 
color image of the results of si- 
multaneous hybridization of Cy5- 
labeled cDNA (red) from 
mock-treated strain R563 and Cy3-labeled cDNA 
(green) from strain R563 treated with 1 pg/m\ FK506. 
b. Enlarged view of the boxed area in a. Arrowheads in- 
dicate specific ORFs induced or repressed, e. Pseudo- 
color image of the results of simultaneous hybridization 
of CyS-labeled cDNA (red) from strain R563 and Cy3- 
labeled cDNA (green) from strain MCY300 (deleted for 
the CNA1.CNA2 catalytic subunits of calcineurin). 
Arrows indicate specific ORFs induced or repressed, d. 
The log, 0 of the expression ratio for each ORF derived 
from the FK506 treatment hybridizations is plotted ver- 
sus the log w of the expression ratio in the calcineurin 
mutant hybridizations. ORFs that were induced or re- 
pressed in both experiments are shown as green and 
red dots, respectively. «. The log l0 of the expression ratio for each ORF de- 
rived from the FK506 treatment hybridizations is plotted versus the log 10 



wt vs. caJdnerurtn muum 




-fr 



Log, 0 (R/G) calcineurin mutation 



Log* (R/G) ycrOJlc mutation 



of the expression ratio in the yer071c mutant hybridizations. No ORFs 
were induced or repressed in both experiments. 



To illustrate this, we treated the his3 mutant strain with 3- 
AT. The signature pattern of altered gene expression resulting 
from treatment of the mutant strain with 3-AT was much less 
complex than that of the 3-AT signature in wild-type cells (Fig. 
4). This is seen simply by examining plots of mean intensity of 
the hybridization signal (which approximately reflects level of 
expression) versus the expression ratio for each ORF (Fig. 4). 
Genes that were expressed at higher or lower levels in 3-AT 
treated cells or in hls3 mutant cells are shown as red and green 
dots, respectively. We analyzed the 3-AT signature in wild-type 
(Fig. 4a) and his3 mutant cells (Fig. 4c), as well as the his3 mu- 
tant strain signature (Fig. 4b). Whereas histidine limitation in- 
duced by 3-AT induced more than 1.000 transcription-level 
changes in the wild-type strain, few or no transcript level 
changes were induced by treatment of the A/s3-deletion strain 
with 3-AT. This indicates that with the growth conditions used, 
essentially all of the effects of 3-AT depend on or are mediated 
through the HIS3 gene product. 

Applying this approach to the calcineurin signaling pathway 
showed the specificity of the method. The calcineurin mutant 
strain and strains with deletions in the genes encoding the 
most abundant immunophilins in yeast" (CPH1 and FPR1) 
were treated with either FK506 or CsA to determine the profiles 



Table 1 Signature correlation of expression ratios as a result of FK506 
treatment in various mutant strains 



wild-type 
+/-FK506 



cna 
+/-FK506 



fprl 
W-FK506 



cna fprl 
+/-FK506 



wild-type 
♦ FK506 



0.93 a 0.04 -0.01 i 0.07 -0.23 t 0.07 0.12 ± 0.07 0.79 ± 0.03 



Signature correlation shows the aDsence ol the FK506 signature specifically in the calcineurin (cna) and fprl 
(major FK506 binding protein) deletion mutants, cna represents the muiani with deletions of the catalytic sub- 
units of calcineurin. CAM 7 and CNA2. The correlation coefficient reponed in the first column represents the cor- 
relation between two pairs of hybridiia lions from independent wild-type -»/- FK506 experiments. 



of altered gene expression resulting from drug treatment of the 
mutant cells (that is. mutant +/- drug). We compared the drug 
signatures in the mutants to the wild-type drug signature using 
the correlation coefficient metric (Table 1). Although the signa- 
ture generated by treatment of wild-type cells with FK506 was 
highly correlated to the calcineurin mutant strain signature (p 
= 0.75 ± 0.03). it bore no similarity to the profile after treat- 
ment of the calcineurin mutant strain with FK506 (p * -0.01 ± 
0.07). This indicates that FK506 was unable to elicit its normal 
transcriptional response in the calcineurin mutant strain. 
Likewise, treatment of the fprl mutant strain with FK506 
elicited an expression profile that was not correlated to the 
FK506 signature in the wild-type strain (p - -0.23 ± 0.07). indi- 
cating that the FPR1 gene product is likely to be involved in the 
pathway affected by FK506. The same was true for the cna fprl 
mutant strain. In contrast, treatment of the cphl mutant strain 
with FK506 generated an expression profile highly correlated 
with the wild-type FK506 expression profile (p » 0.79 ± 0.03). 
indicating the cphl mutation did not block the mode of action 
of FK506 and thus is not directly involved in the pathway af- 
fected by FK506. We tabulated the change in expression in re- 
sponse to FK506 in different mutant strains for all ORFs with 
expression ratios greater than 1.8 in FK506-treated cells or in 
the calcineurin mutant strain (Fig. 5a).The 
calcineurin mutant strain signature and the 
FK506 responses in wild-type and the cphl 
mutant strain are similar, and there are no 
transcript-level changes (seen in black) for 
treatment of the calcineurin. fprl and cna 
fprl mutant strains with FK506 (Fig. 5a). 

Similar experiments and analyses with CsA 
provided further validation of this approach. 
The expression profile elicited by treatment 
of wild-type cells with CsA was highly corre- 



cphl 
«/-FK506 
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Fig. 3 Expression profiles 
from a his3 mutant strain 
and wild-type (wt) cells 
treated with 3*AT share a 
genome-wide correlation. 
DNA microarray analysis 
showing changes in gene 
expression resulting from 3* 
AT treatment (a) or from ge- 
netic disruption of the H/S3 

gene (c). ». Pseudo-color itu iff iJflttSftiiigifcftM ° 
image of the results of simul- 
taneous hybridization of 

Cy5-labeled cDNA (red) from mock-treated wild-type strain R491 and 
Cy3-labeled cDNA (green) from strain R491 treated with 10 mM 3-AT. 
b, Plot of the log to of the expression ratio for each ORF derived from the 
3-AT treatment hybridizations is plotted versus the log 10 of the expression 
ratio in the his3 mutant hybridizations. ORFs that were induced or re- 
pressed in both experiments are shown as green and red dots, respec- 
tively. The correlation of expression ratios applies not only to genes with 
large expression ratios (for example, CHA 7 and ARCl), but also extends to 
genes with expression ratios less than 2 (for example. ILVi and CPHl). 
ILVl is induced 1.9-fold and 1 .5-fold, and CPH1 is downregulated 1 .9-fold 



AMCI 



wt vs. hbj mutation 




ittss * 




log* (RSG) mutation 




and 1 .7-rold. in cells treated with 3-AT and hts3 mutant cells, respectively. 
Two ORFs do not rail on the line x « y. The leftmost point is the HIS3 data 
point, which is induced by 3-AT treatment but which is not absent from 
the his3 mutant strain. The other point is YOR203w. Both data points are 
labeled HIS3 because hybridization to YOR203w is most likely due to HIS3 
mRNA, as YOR203w overlaps the HIS3 open reading frame. «, Pseudo- 
color image of the results of simultaneous hybridization of Cy 5 -labeled 
cDNA (red) from wild-type strain R491 and Cy3-labeled cDNA (green) 
from strain R1226, deleted for the HIS3 gene. Arrowheads indicate spe- 
cific ORFs induced or repressed. 



lated to the profile elicited by mutation of the calcineurin genes 
(p * 0.71 i 0.04). but did not correlate with the expression pro- 
file resulting from treatment of the calcineurin mutant strain 
with CsA (p o -0.05 ± 0.07; Table 2). indicating that the genetic 
deletion of calcineurin interfered with the ability of CsA to 
elicit its normal transcriptional response. Likewise, the CsA sig- 
nature was essentially absent in CsA-treated cphl mutant cells, 
and the expression profile of CsA-treated cphl mutant cells cor- 
related poorly to that of CsA-treated wild-type cells (p = 0.18 ± 
0.07). Thus, the CPH1 gene product was required for the CsA re- 
sponse seen in wild -type cells. Conversely, treatment of fprl 
mutant cells with CsA resulted in an expression pattern very 
similar to the profile of CsA-treated wild-type cells (p « 0.77 ± 
0.03). indicating that FPR1 was not necessary for the CsA-medi- 
ated effects. Analysis of individual ORFs affected by CsA and 
their expression ratios over the entire set of experiments con- 
firmed that CPH1 and the genes encoding calcineurin. but not 



3 wt-/* 10mM3-AT 
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tog M (intensity) 



Fig. 4 Treatment of the hi$3 mutant strain with 3-AT shows nearly com- 
plete loss of 3-AT signature. A plot of the log,„ of the mean intensity of hy- 
bridization for each ORF versus the iog ie of its expression ratio for each 
experiment is shown next to a pseudo-color image of a representative 
portion of the microarray. ORFs that are induced or repressed at the 95% 
confidence level are shown in green and red, respectively, s, Expression 
profile from treatment of the wild-type (wt) strain with 3-AT. Cy5-ta beted 
cDNA (red) from mock-treated strain R491 and Cy3*labeled cDNA 
(green) from strain R491 treated with 10 mM 3-AT. b. Expression profile 
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FPRL are necessary for the wild-type CsA response (Fig. 56). The 
observation that the profiles resulting from FK506 or CsA drug 
treatment are similar to that of the calcineurin deletion mutant 
strain might allow the prediction that calcineurin was involved 
in the pathway affected by these drugs. But because the expres- 
sion profile of the fprl mutant strain did not bear a strong simi- 
larity to the wild-type drug expression profile for FK506, it is 
obvious that the drug treatment of the mutant strains was nec- 
essary to identify Fprl , but not Cphl . as a potential FK506 drug 
target. In the same way. the 'decoder' strategy was necessary to 
identify Cphl. but not Fprl, as a potential drug target for CsA. 

'Decoder' approach can identify secondary drug effects 
For a drug that has a single biochemical target, the strategy out- 
lined above may be useful in target validation. In many cases, 
however, a compound may affect multiple pathways and elicit 
a very complex signature. Decoding' such a complex signature 



his3 mutant -/■♦ 10 mM 3-AT, 




log, 0 (intensity) 

from the his3 deletion strain. Cy5-labeled cDNA (red) from strain R491 
and Cy3-labeled cDNA (green) from strain R1226. deleted for the HIS3 
gene, e, Expression profile of treatment of the his3 deletion strain with 3- 
AT. Cy3-labeled cDNA (red) from h/s3-deleted strain R1226 and CyS-la- 
beled cDNA (green) from strain R1226 treated with 10 mM 3-AT. 
Arrowheads indicate the DNA probe and data point corresponding to the 
HIS3 gene. The blue dashed line represents the threshold below which er- 
rors tend to increase rapidly because spot intensities are not sufficiently 
above background intensity. 
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-Table* Signature 'correlation repression ratios as a result of CsA 
treatment in various mutant strains 



wild-type 
♦/-CsA 



cna 
♦/-CsA 



wild-type 
+/-CsA 



fprl 
♦/-CsA 



cna cphl 
♦/-CsA 



cphl 
♦/-CsA 



0.94 1 0.04 -0,05 1.07 0.77 i 0.03 -Q,n ± Q.07 0.18 i 0.07 



Q Strain: cm 



FK506: 



Signature ; conMon shov« the absence of the CsA signature specifically in the calcineurin (cna) and com 
(CyCto f h rl? ,e ^^T"- lhe mul8nt witn ^.etions of the cata.ytic subunits of^al 

aneurin. CNA I and CW The correlate coefficient reponed in the first column represents the cc^e^at^n 
between two pain of hybridizations from independent wild-rype ♦/- CsA experiments correlation 



into the effects mediated through the intended target (the on- 
target signature*) and those mediated through unintended tar- 
gets (the off-target' signature) might be useful in evaluating a 
compound's specificity. Our 'decoder' strategy is based on the 
premise that 'off-target' signature should be insensitive to the 
genetic disruption of the primary target. 

To determine whether the 'decoder' approach could identify 
an 'off-target' profile, we looked for a drug-responsive gene 
whose expression is insensitive to deletion of the primary tar- 
g get. To increase the likelihood of observing such genes, the 
g same strains described in Tables 1 and 2 were treated with 
| higher concentrations (50 ug/ml) of FK506. This led to a much 
g more complex expression profile in wild-type cells, indicating 
§ that at this higher concentration. FK506 was inhibiting or acti- 
"g vating additional targets. Several of the .ORFs in this expanded 
| FK506-induced expression profile were not affected by the cal- 
g cineurin. cphl or fprl mutations, as drug treatment of these mu- 
tant strains did not block their presence in the FK506 
j? expression signature (Fig. 6). This indicates that FK506 was trig- 
„ gering changes in transcript levels of many genes through path- 
ways independent of calcineurin. CPHJ and FPRl. Many of the 
upregulated ORFs in the off-target' pathway were genes re- 
ported to be regulated by the transcriptional activator Gcn4 
(ref. 24). In some strains, a reporter gene under GCN4 control 
was induced in response to FK506 treatment". To determine 
whether GCN4 is involved in this pathway that is independent 
of calcineurin. CPH1 and FPRL we analyzed the effects of treat- 
ment with high-dose FK506 on global gene expression in a 
strain with a GCN4 deletion (Fig. 6). Of the 41 ORFs with cal- 
cineurin-independent expression ratios greater than 4. 32 were 
not induced in Xhtgcn4 mutant, indicating that their induction 
by FK506 was GCM-dependent. Not all CC7V4-reguJated genes 
were induced by FK506. This FK506-induced subset of GCN4- 
regulated genes may be those most sensitive to subtle changes 
in Gcn4 levels, or perhaps other regulatory circuits prevent 
FK506 activation of some CCTW-regulated genes. Seven of the 
remaining nine ORFs induced by FK506 were independent of 

Fig. 5 Response of FK506 and CsA signature genes in strains with deletions 
in different genes. Genes with expression ratios greater than a factor of 1 .8 in 
response to treatment with 1 ug/ml FK506 (a) or 50 pg/ml CsA (b) are listed 
(left side) and their expression ratios in the indicated strain are shown on the 
green (induction)-fed (repression) color scale, m, Calcineurin (cna) mutant 
and FK506 treatment signature genes are in the first two columns. Almost all 
FK506 signature genes have expression ratios near unity in deletion strains 
involved in pathways affected by FK506 (calcineurin. fprl and cna fprl mu- 
tants) but not in deletion strains in unrelated pathways (cphl). b, Calcineurin 
(cna) mutant and CsA treatment signature genes are in the first two 
columns. Almost all CsA signature genes have expression ratios near unity in 
deletion strains involved in pathways affected by CsA (calcineurin. cphl and 
cna cphl mutants) but not in deletion strains in unrelated pathways (fprl). 
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both the calcineurin and GCN4 pathways. The 
simplest explanation is that FK506 inhibits r 
activates additional pathways. Members of this 
class include SNQ2 and PDRS, genes that en- 
code drug efflux pumps with structural homol- 
ogy to mammalian multiple drug resistance 
proteins*. FK506 may interact directly with 
Pdr5 to inhibit its function". Our results indi- 
cate that treatment with FK506 leads to four- 

fold-to-sixfold induction of PDRS mRNA levels. 

YORl t another gene that can confer drug resis- 
tance, is also induced threefold-to-fourfold by 
FK506. Thus, drug treatment of strains with mutations in the 
primary targets can prove useful in identifying effects mediated 
by secondary drug targets, including the nature and extent of 
newly discovered and previously unsuspected pathways af- 
fected by the drug. 

We describe here a method for drug target validation and the 
identification of secondary drug target effects that uses DNA mi- 
croarrays to survey the effects of drugs on global gene expres- 
sion patterns. We established that genetic and pharmacologic 
inhibition of gene function can result in extremely similar 
changes in gene expression. We also demonstrated that one can 
confirm a potential drug target by treating a deletion mutant 
defective in the gene encoding the putative target. Drug-medi- 
ated signatures from strains with mutations in pathways or 
processes directly or indirectly affected by the drug bore little or 
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no similarity to the wild-type drug expression profile. In con- 
trast, drug-mediated signatures from strains with mutations in 
genes involved in pathways unrelated to the drug's action 
showed extensive similarity to the wild-type drug signature. By 
applying this approach to a drug that affects multiple pathways 
(FK506). we were able to decode a complex signature into com- 
ponent parts, including the identification of an off-target* sig- 
nature that was mediated through pathways independent of 
calcineurin or the Fprl immunophilin. 

Discussion 

Jt is well-established that high-throughput biochemical screen- 
ing can identify potent inhibitory compounds against a given 
target. The 'decoder' approach described here complements 
this process by evaluating the equally important property of 
specificity: the tendency of a compound to inhibit pathways 
other than that of its intended target. The ability to observe 
such 'off-target* effects will likely be useful in several ways. 
Profiling compounds with known toxicities will allow the de- 
velopment of a database of expression changes associated with 
particular toxicities. Recognition of potential toxicities in the 
off-target' signatures of otherwise promising compounds then 
may allow earlier identification of those likely to fail in clinical 
trials. Comparing the extent and peculiarities of off-target' sig- 
natures of promising drug candiates could provide a new way 
to group compounds by their effects on secondary pathways, 
even before those effects are understood. This may prove to be 
an alternative, potentially more effective, way to select com- 
pounds for animal and clinical trials. Some drugs are more ef- 
fective against a related protein than against the originally 
intended target. Sildenafil (Viagra™), for example, was initially 
developed as a phosphodiesterase inhibitor to control cardiac 
contractility, but was found to be highly specific for phospho- 
diesterase 5. an isozyme whose inhibition overcomes defects in 
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Fig. 6 Response of FK506 signature genes in strains with deletions 
in different genes. Genes with expression ratios greater than a factor 
of 4 in at least one experiment are listed and their expression ratios in 
the indicated strain are shown in the green (inductionHed (repres- 
sion) color scale. The genes have been divided into classes corre- 
sponding to these expected behaviors: TAM-dependenT genes 
respond to FK506 (50 \ig/m\) except when either calcineurin genes or 
FPR1 or both are deleted: 'GCAM-dependent' genes respond to FK506 
except when GCN4 is deleted. These genes still respond to FK506 
when calcineurin genes or FPRl or CP HI are deleted; that is. their re- 
sponses are not mediated by calcineurin, Cphl. or Fprl. 'CAM- and 
GCA/4-independent* genes respond to FK506 in all deletion strains 
tested. A complex behavior* class is provided for those genes that did 
not match the model of FK506 response mediated through cal- 
cineurin or Fprl or separately through Gcn4. 



penile erection. It is possible that application of the 'de- 
coder' to other compounds may show that they too have a 
potent activity against a target distinct from their in- 
tended target. 

The ability to decode drug effects is dependent on the 
availability of functionally targetless' cells. In yeast, this 
is being achieved by systematically disrupting each yeast 
gene (Saccharomyces Deletion Consortium: http://se- 
quence-www.stanford.edu/group/yeast.deledon pro- 
ject/deletion.html). Efforts are underway to obtain 
™" expression profiles from each deletion mutant strain. 
Determining signatures resulting from inactivation of es- 
sential genes presents a unique problem, but it may be 
possible to do so by examining heterozygotes or by using a con- 
trollable promoter to reduce expression of the essential gene 
Although it is already feasible to test several compounds in 
dozens of yeast strains, another challenge for the 'decoder' 
strategy will be the efficient selection of the mutants with dele- 
tions in genes most likely to encode the intended drug target. 
The signature correlation plots described are one metric that 
could be used as part of that selection process, but others need 
to be explored. Applying the decoder' to mammalian cells pre- 
sents additional challenges. It is considerably more difficult to 
isolate functionally targetless' cells. Strategies involving titrat- 
able promoters, known specific inhibitors, anti-sense RNAs. ri- 
bozymes. and methods of targeting specific proteins for 
degradation are possible and should be tested. Another limita- 
tion is that not all cell types express the same set of genes and 
therefore off-target* effects may be different in different cell 
types. In addition, applying the decoder" to human cells will 
also require technical improvements that allow expression pro- 
filing from a small number of cells. Even the broader question 
of whether the insensitivity of off-targef signatures to the dis- 
ruption of the main target is the exception or the rule can only 
be answered by the accumulation of more data. Barkai and 
Leibler. however, have argued in favor of robustness of biologi- 
cal networks, indicating that drug perturbations (off-targef 
signatures) may be robust even when the system is subjected to 
another perturbation (such as a genetic disruption) (ref. 28). 
Many practical developments will be necessary if the 'decoder' 
concept is to be broadly applied. 

Expression arrays have been used mainly as an initial screen 
for genes induced in a particular tissue or process of interest by 
focusing on genes with large expression ratios. We have 
found, however, that effort to refine experimental protocols 
and repeat experiments increases the reliability of the data and 
permits new applications. For example, it provides a larger set 
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Table 3 Yeast strains used 



Strain 

YPH499 

R563 

R558 

R567 

MCY300 

R132 

R133 

R559 

BY4719 

BY4738 

R491 

BY4728 

BY4729 

R1226 



Relevant genotype 

Mata ura3-S2 Iys2-801 ade2-101 trphA63 hts3-A200leu2-61 

Mata ura3-S2 tys2-801 ade2* 101 trpl -A63 hb3-A200 feu2-A 1 his3::H!S3 

Mate ura3-52 tys2-801 ade2-101 trphA63 his3-A200 Ieu2-A1 fpr1::HIS3 

Mata ura3-52lys2-80lade2-101 trp1-A63 his3-A200 Ieu2-A1 cphV:HIS3 

Mata ura3-52 tys2-801 ade2-101 trp1-A63 his3-A200 !eu2-Al cnaW::hisGcna2A1::HtS3 

Mata ura3>52 Iys2-801 ade2-101 trp1-/\63 his3-A200 teu2-A1 cnaU1::hisGcrta2A1::HIS3cph1::karf 

Mata ura3 52 ty*2-801 ade2-101 trp1-A63 his3-A200 Ieu2-Al cnaU1::hisG cna2A1::HlS3 fpr1::karf 

Mata ure3-S2 Iys2-801 ade2-101 trp1-A63 his3-A200 Ieu2-A 7 hh>3::HlS3 gcn4::L£U2 

Mata trp1-A63ura3-60 

Mata trpl -A 63 ura3*A0 

Mata/ct BY471 9 XBY4738 

Mata his3-A200 trp1-A63 ura3*A0 

Mata his3*A 200 trpl -663 ura3~A0 

Mata/a BY4728 XBY4729 



Reference 
(34) 

(this study) 
(this study) 
(this study) 
(21) 

(this study) 
(this study) 
(this study) 
(35) 
(35) 

(this study) 

(35) 

(35) 

(this study) 



of genes at higher confidence levels that serve as a more 
unique signature for a given protein perturbation. In addition, 
g it allows subtle signatures to be detected, when, for example, a 
8 protein is only partially inhibited. This may enable clinical 
| monitoring of small changes in protein function in disease or 
» toxicity states before they could otherwise be detected, 
g Because the functions of many genes detected on transcript ar- 
3* rays are known, these microarrays are powerful tools that pro- 
| vide detailed information about a cells physiology. For 
?? example, changes in the flux through a metabolic pathway are 
g reflected in transcriptional changes in genes in the pathway 7 . 

Furthermore, it may be possible to indirectly measure protein 
c activity levels from expression profiling data (S.F.. et a/., un- 
3 published data). Thus, although the eventual development of 
| genomic methods allowing the direct measurement of all cel- 
< lular protein levels will be an important achievement, tran- 
| script array technology offers an immediate and robust means 
» of evaluating the effects of various treatments on gene expres- 
oo sion and protein function. 

8 Methods 

Construction, growth and drug treatment of yeast strains. The strains 
used in this study (Table 3) were constructed by standard techniques". 
To construct strain R559. strain R563 was transformed to Leu* with plas- 
mid pM12 digested by Sa/i and MliA (provided by A. Hinnebusch and T. 
Dever). Strains R132 and R133 were constructed by transforming ihe bac- 
terial kanamycin resistance cassette' 0 flanked by genomic DNA from the 
CPH1 and FPR1 loci, respectively, and selecting for G4l8-resistant 
colonies. For experiments with FK506, cells were grown for three genera- 
tions to a density of 1 x 10' cells/ml in YAPD medium (YPD plus 0.004% 
adenine) supplemented with 10 mM calcium chloride as described". 
Where indicated, FK506 was added to a final concentration of 1 ug/ml 
0.5 h after inoculation of the culture or to 50 pg/ml 1 h before cells were 
collected. CsA was used at a final concentration of 50 ug/ml. Cells were 
broken by standard procedures" with the following modifications: Cell 
pellets were resuspended in breaking buffer (0.2 M Tris HCI pH 7.6. 0.5 M 
NaCI, 10 mM EDTA. 1% SDS). vortexed for 2 min on a VWR multi-tube 
vonexer at setting 8 in the presence of 60% glass beads (425-600 jim 
mesh; Sigma) and phenol rchloroform (50:50, volume/volume). After sep- 
aration of the phases, the aqueous phase was re-extracted and ethanol- 
precipitated. Poly A* RNA was isolated by two sequential 
chromatographic purifications over oligo dT cellulose (New England 
Biolabs, Beverly, Massachusetts) using established protocols". 

For experiments using 3- AT, wild-type or his3/his3 cells were grown to 
early logarithmic phase in SC medium, pelleted and resuspended in SC 
medium lacking histidine for 1 hr in the presence or absence of 10 mM 3- 



AT. as indicated. Cells were harvested and mRNA isolated as above. 
FK506 was obtained from the Swedish Hospital Pharmacy (Seattle. 
Washington) and purified to homogeneity by ethyl acetate extraction by 
J. Simon (Fred Hutchinson Cancer Research Center. Seattle. Washington). 
CsA was obtained from Alexis Biochemicals (San Diego. California); 3-AT 
was from Sigma. 

Preparation and hybridization of the labeled sample. Fluoresc entry-la- 
beled cDNA was prepared, purified and hybridized essentially as de- 
scribed 7 . Cy3- or Cy5-dUTP (Amersham) was incorporated into cDNA 
during reverse transcription (Superscript II; Life Technologies) and puri- 
fied by concentrating to less than 10 pi using Microcon-30 microconcen- 
trators (Amicon, Houston, Texas). Paired cDNAs were resuspended in 
20-26 M> hybridization solution (3 x SSC, 0.75 pg/ml polyA DNA. 0.2% 
SDS) and applied to the microarray under a 22* x 30-mm coverslip for 6 
h at 63 *C, all according to a published method'. 

Fabrication and scanning of microarrays. PCR products containing 
common 5' and 3' sequences (Research Genetics. Huntsville, Alabama) 
were used as templates with amino-modified forward primer and unmod- 
ified reverse primers to PCR amplify 6.065 ORFs from the 5. cerevisiae 
genome. Our first-pass success rate was 94%. Amplification reactions that 
gave products of unexpected sizes were excluded from subsequent analy- 
sis. ORFs that could not be amplified from purchased templates were am- 
plified from genomic DNA. DNA samples from 100-pl reactions were 
isopropanol-precipitated, resuspended in water, brought to a final con- 
centration of 3x SSC in a total volume of 15 pi, and transferred to 384- 
well microtiter plates (Genetix Limited. Christchurch, Dorset, England). 
PCR products were spotted onto 1 x 3 -inch poly lysine- treated glass slides 
by a robot built essentially according to defined specifications'* 1 
(http://cmgm.stanlord.edu/pbrown/MGuide). After being printed, slides 
were processed according to published protocols'. 

Microarrays were imaged on a prototype multi-frame CCD camera in 
development at Applied Precision (Issaquah. Washington). Each CCD 
image frame was approximately 2-mm square. Exposure times of 2 s in 
the Cy5 channel (white light through Chroma 618-648 nm excitation fil- 
ter, Chroma 657-727 nm emission filter) and 1 s in the Cy3 channel 
(Chroma 535-560 nm excitation filter, Chroma 570-620 nm emission fil- 
ter) were done consecutively in each frame before moving to the next, 
spatially contiguous frame. Color isolation between the Cy3 and Cy5 
channels was about 100:1 or better. Frames were 'knitted' together in 
software to make the complete images. The intensity of spots (about 100 
jim) were quantified from the 10-pm pixels by frame-by-frame back- 
ground subtraction and intensity averaging in each channel. Dynamic 
range of the resulting spot intensities was typically a ratio of 1,000 be- 
tween the brightest spots and the background-subtracted additive error 
level. Normalization between the channels was accomplished by normal- 
izing each channel to the mean intensities of all genes. This procedure is 
nearly equivalent to normalization between channels using the intensity 
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ratio of genomic DNA spots', but is possibly more robust, as it is based on 
the intensities of several thousand spots distributed over the array. 

Signature correlation coefficients and their confidence limits. 
Correlation coefficients between the signature ORFs of various ex peri- 
menu were calculated using: 

p-I*y l /<I* , Xy k ') w 
k k k 

where x* is the log, 0 of the expression ratio for the k w gene in the x signa- 
ture. and y. is the log t0 of the expression ratio for the k* gene in the y sig- 
nature. The summation is over those genes that were either up- or 
down-regulated in either experiment at the 95% confidence level. These 
genes each had a less than 5% chance of being actually unregulated (hav- 
ing expression ratios departing from unity due to measurement errors 
alone). This confidence level was assigned based on an error model which 
assigns a lognormal probability distribution to each gene's expression 
ratio with characteristic width based on the observed scatter in its re- 
peated measurements (repeated arrays at the same nominal experimental 
conditions) and on the individual array hybridization quality. This latter 
dependence was derived from control experiments in which both Cy3 
and Cy5 samples were derived from the same RNA sample. For large 
numbers of repeated measurements the error reduces to the observed 
scatter. For a single measurement the error is based on the array quality 
and the spot intensity. 

Random measurement errors in the x and y signatures tend to bias the 
correlation towards zero. In most experiments, most genes are not signif- 
icantly affected but do show small random measurement errors. Selecting 
only the '95% confidence' genes for the correlation calculation, rather 
than the entire genome, reduces this bias and makes the actual biological 
correlations more apparent. 

Correlations between a profile and itself are unity by definition. Error 
limits on the correlation are 95% confidence limits based on the individ- 
ual measurement error bars, and assuming uncorrected errors". They do 
not include the bias mentioned above; thus, a departure of p from unity 
does not necessarily mean that the underlying biological correlation is im- 
perfect. However, a correlation of 0.7 ±0.1, for example, is very signifi- 
cantly different from zero. Small (magnitude of p < 0.2) but formally 
significant correlation in the tables and text probably are due to small sys- 
tematic biases in the Cy5/Cy3 ratios that violate the assumption of inde- 
pendent measurement errors used to generate the 95% confidence 
limits. Therefore, these small correlation values should be treated as not 
significant. A likely source of uncorrected systematic bias is the partially 
corrected scanner detector nonlinearity that differently affects the Cy3 
and Cy5 detection channels. 

The 1 ug/ml FK506 treatment signature was compared with more 
than 40 unrelated deletion mutant strain or drug signatures. These con- 
trol profiles had correlation coefficients with the FK506 profile that were 
distributed around zero (mean p * -0.03) with a standard deviation of 
0.16 (data not shown), and none had correlations greater than p = 0.38. 
Similarly, the calcineurin mutant strain signature correlated well with the 
CsA treatment signature (p « 0.71 i 0.04) but not with the signatures 
from the negative controls (mean p = -0.02 with a standard deviation of 
0.18). 

Quality controls. End-to-end checks on expression ratio measurement 
accuracy were provided by analyzing the variance in repeated hybridiza- 
tions using the same mRNA labeled with both Cy3 and Cy5. and also 
using Cy3 and Cy5 mRNA samples isolated from independent cultures of 
the same nominal strain and conditions. Biases undetected with this pro- 
cedure, such as gene-specific biases presumably due to differential incor- 
poration of Cy3- and Cy5-dUTP into eDNA, were minimized by doing 
hybridizations in fluor-reversed pairs, in which the Cy3/Cy5 labeling of 
the biological conditions was reversed in one experiment with respect to 
the other. The expression ratio for each gene is then the ratio of ratios be- 
tween the two experiments in the pair. Other biases are removed by algo- 
rithmic numerical de-trending. The magnitude of these biases in the 
absence of de-trending and fluor reversal is typically about 30% in the 
ratio, but may be as high as twofold for some ORFs. 

Expression ratios are based on mean intensities over each spot. Some 



3. 



6. 



smaller spots have fewer image pixels in the average. This does not de- 
grade accuracy noticeably until the number of pixels falls below ten, in 
which case the spot is rejected from the data set. 'Wander' of spot posi- 
tions with respect to the nominal grid is adaptively tracked in may sub- 
regions by the image processing software. Unequal spot "wander* within 
a subregion greater than half-a-spot spacing is a difficulty for the auto- 
mated quantitating algorithms: in this case, the spot is rejected from 
analysis based on human inspection of the 'wander*. Any spots partially 
overlapping are excluded from the data set. Less than 1% of spots typi- 
cally are rejected for these reasons. 
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co mosaic viral RNA was obtained by phenol and 
chloroform extractions of the virus and precipitated 
from ethanoL CA-NC assembly reaakms m the pres- 
ence of noneognate RNAs were identical to those 
given in (9). In the absence of RNA, CA-NC cones 
formed under the following conditions: 300 >iM CA- 
NC. 1 M Nad and 50 mM tris-HCI (pH 8.0) at 37»C 
for 60 min. In the absence of exogenous RNA, neither 
cones nor cylinders formed at concentrations of 0.5 
M NaCI or below. Absorption spear a demonstrated 
that our CA-NC preparations were not contaminated 
with Escherichia cotf RNA (estimated lower detection 
limit was -1 base/protein molecule). To control for 
even lower levels of RNA contamination, we prein- 
aibated the CA-NC protein with 0.5 mg/ml ribonu- 
clease A (Type 1-AS. 54 Kunitz U/mg, Sigma) for 1 
hour at 4 9 C, which then formed cones normally. 
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14. M. Ce and K. 5mler, Chem. Wiyj. Lett. 220. 192 
(1994). 
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The Transcriptional Program in 
the Response of Human 
Fibroblasts to Serum 

Vishwanath R. Iyer. Michael B. Eisen, Douglas T. Ross, 
Greg Schuler, Troy Moore, Jeffrey C. F. Lee, Jeffrey M. Trent, 

Louis M. Staudt. James Hudson Jr., Mark S. Boguski, 
Deval Lashkari, Dari Shalon, David Botstein, Patrick O. Brown* 



The temporal program of gene expression during a model physiological re- 
sponse of human cells, the response of fibroblasts to serum, was explored with 
a complementary DNA microarray representing about 8600 different human 
genes. Genes could be clustered into groups on the basis of their temporal 
patterns of expression in this program. Many features of the transcriptional 
program appeared to be related to the physiology of wound repair, suggesting 
that fibroblasts play a larger and richer role in this complex multicellular 
response than had previously been appreciated. 



The response of mammalian fibroblasts to 
serum has been used as a model for studying 
growth control and cell cycle progression (/). 
Normal human fibroblasts require growth 
factors for proliferation in culture; these 
growth factors are usually provided by fetal 
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bovine serum (FBS). In the absence of 
growth factors, fibroblasts enter a nondivid- 
ing state, termed G Ut characterized by low 



metabolic activity. Addition of FBS or puri- 
fied growth factors induces proliferation of 
the fibroblasts; the changes in gene expres- 
sion that accompany this proliferative re- 
sponse have been the subject of many studies, 
and the responses of dozens of genes to se- 
rum have been characterized. 

We took a fresh look at the response of 
human fibroblasts to serum, using cDNA mi- 
croarrays representing about 8600 distinct hu- 
man genes to observe the temporal program of 
transcription that underlies this response. Pri- 
mary cultured fibroblasts from human neonatal 
foreskin were induced to enter a quiescent state 
by serum deprivation for 48 hours and then 
stimulated by addirion of medium containing 
10% FBS {2). DNA microarray hybridization 
was used to measure the temporal changes in 
mRNA levels of 8613 human genes (J) at 12 
times, ranging from 15 min to 24 hours after 
serum stimulation. The cDNA made from pu- 
rified mRNA from each sample was labeled 
with the fluorescent dye Cy5 and mixed with a 
common reference probe consisting of cDNA 
made from purified mRNA from the quiescent 
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Fig. 1. The same section of 
the microarray is shown 
for three independent hy- 
bridizations comparing RNA 
isolated at the 8-hour time 
point after serum treat- 
ment to RNA from serum- 
deprived cells. Each mi- 
croarray contained 9996 
elements, including 9804 
human cDNAs. represent- 
ing 8613 different genes. 
mRNA from serum-de- 
prived cells was used to 
prepare cDNA labeled with 

Cy3-deoxyuridine triphosphate (dUTP), and mRNA harvested from cells at different times after serum 
stimulation was used to prepare cDNA labeled with Cy5-dUTP. The two cDNA probes were mixed and 
simultaneously hybridized to the microarray. The image of the subsequent scan shows genes whose 
mRNAs are more abundant in the serum-deprived fibroblasts (that is. suppressed by serum treatment) 
as green spots and genes whose mRNAs are more abundant in the serum-treated fibroblasts as red 
spots. Yellow spots represent genes whose expression does not vary substantially between the two 
samples. The an-ows indicate the spots representing the following genes: 1, protein disulfide isomerase- 
related protein P5; 2, 1L-8 precursor; 3. EST AA057170; and 4, vascular endothelial growth factor. 
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"cultureTtinie zero) labelietf wittTa second flud-**' 
rescent dye, Cy3 {4). The color images of the 
hybridization results (Fig. 1) were made by 
representing the Cy3 fluorescent image as 
green and the Cy5 fluorescent image as red and 
merging the two color images. 

Diverse temporal profiles of gene expres- 
sion could be seen among the 8613 genes sur* 



Fig. 2. Ouster image 
showing the different 
classes of gene expres- 
sion profiles. Five hun- 
dred seventeen genes 
whose mRNA levels 
changed in response to 
serum stimulation were 
selected (7). This sub- 
set of genes was clus- 
tered hierarchically into 
groups on the basis of 
the similarity of their 
expression profiles by 
the procedure of Eisen 
et at. (6). The expres- 
sion pattern of each 
gene in this set is dis- 
played here as a hori- 
zontal strip. For each 
gene, the ratio of 
mRNA levels in fibro- 
blasts at the indicat- 
ed time after serum 
stimulation ("unsync" 
denotes exponentially 
growing cells) to its 
level in the serum-de- 
prived (time zero) fi- 
broblasts is represented 
by a color, according to 
the color scale at the 
bottom. The graphs 
show the average ex- 
pression profiles for the 
genes in the corre- 
sponding "cluster" (in- 
dicated by the letters A 
to J and color coding). 
In every case examined, 
when a gene was rep- 
resented by more than 
one array element, the 
multiple representa- 
tions in this set were 
seen to have identical 
or very similar expres- 
sion profiles, and the 
profiles corresponding 
to these independent 
measurements clus- 
tered either adjacent 
or very dose to each 
other, pointing to the 
robustness of the clus- 
tering algorithm in 
grouping genes with 
very similar patterns of 
expression. 
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veyedln this experimenf(Fig. 2); many of these 
genes (about half) were unnamed expressed 
sequence tags (ESTs) {5). Although diverse 
patterns of expression were observed, the order- 
ly choreography of the expression program be- 
came apparent when the results were analyzed 
by a clustering and display method developed 
in our laboratory for analyzing genome-wide 
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gene expression data (6). An example of such 
an anarysisT here applied to a subset of 5I7 
genes whose expression changed substantially 
in response to serum (7), is shown in Fig. 2. 
The entire detailed data set underlying Fig. 
2 is available as a tab-delimited table (in 
cluster order) at the Science Web site (www. 
sciencemag.org/feature/data/984559.shl). In 
addition, the entire, larger data set for the 
complete set of genes analyzed in this exper- 
iment can be found at a Web site maintained 
by our laboratory (gcnome-www.stanford. 
edu/serum) (5). 

One measure of the reliability of the 
changes we observed is inherent in the ex- 
pression profiles of the genes. For most genes 
whose expression levels changed, we could 
see a gradual change over a few time points, 
which thus effectively provided independent 
measurements for almost all of the observa- 
tions. An additional check was provided by 
the inclusion of duplicate and, in a few cases, 
multiple array elements representing the 
same gene for about 5% of the genes included 
in this microarray. In addition, three indepen- 
dent hybridizations to different microanrays 
with mRNA samples from cells harvested 8 
hours after serum addition showed good cor- 
relation (Fig. I ). As an independent test, we 
measured the expression levels of several 
genes using the TaqMan 5' nuclease fiuori- 
genic quantitative polymerase chain reaction 
(PCR) assay (P). The expression profiles of 
the genes, as measured by these two indepen- 
dent methods, were very similar (Fig. 3) {JO). 

The transcriptional response of fibroblasts 
to serum was extremely rapid. The immediate 
response to serum stimulation was dominated 
by genes that encode transcription factors 
and other proteins involved in signal trans- 
duction. The mRNAs for several genes [in- 
cluding c-FOS, JUN B, and mitogen-acti- 
vated protein (MAP) kinase phosphatase- 1 
(MKP1)] were detectably induced within 
15 min after serum stimulation (Fig. 4, A 
and B). Fifteen of the genes that were 
observed to be induced by serum encode 
known or suspected regulators of transcrip- 
tion (Fig. 4B). All but one were immediate- 
early genes — their induction was not inhib- 
ited by cycloheximide (7/). This class of 
genes could be distinguished into those 
whose induction was transient (Fig. 2, clus- 
ter E) and those whose mRNA levels re- 
mained induced for much longer (Fig. 2, 
clusters 1 and J). Some features of the 
immediate response appeared to be directed 
at adaptation to the initiating signals. We 
observed a marked induction of mRNA 
encoding MKP1, a dual-specificity phos- 
phatase that modulates the activity of the 
ERKI and ERK2 MAP kinases {12). The 
coincidence of the peak of expression of 
genes in cluster E (Fig. 2) with that of 
MKP1 (Fig. 4A) suggests the possibility 
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— thai t oii ii uued act ivhyof the MAP kinase path- 
way is required to maintain induction of these 
genes but not of those with sustained expression 
(clusters 1 and J). The gene encoding a second 
member of the dual-specificity MAP kinase 
phosphatase family, known as dual-specificity 
protein phosphatase 67pyst2, was induced later, 
at about A hours after serum stimulation. Genes 
encoding diverse other proteins with roles in 
signal transduction, ranging from cell-surface 
receptors [for example, the sphingosine 1- 
phosphatc receptor (EDG-1), the vascular en- 
dothelial growth factor receptor, and the type I] 
BMP receptor] to regulators of G-protein sig- 
naling (for example, NETl/pll5 rho GEF) to 
DNA-binding transcription factors, were in- 
duced by serum (Fig. 4A). 

The reprogramming of the regulatory cir- 
cuits in response to serum involved not only 
induction of transcription factors but also re- 
duced expression of many transcriptional reg- 
ulators — some of which may play roles in 
maintaining the cells in G 0 or in priming 
them to react to wounding (Fig. 4Q. Perhaps 
as a consequence of the historical focus on 
genes induced by serum stimulation of fibro- 
blasts, the set of transcription factors whose 
expression diminished upon serum stimula- 
tion has been less well characterized. 

Genes known or likely to be involved in 
controlling and mediating the proliferative re- 
sponse showed distinctive patterns of regula- 
tion. Several genes whose products inhibit pro- 
gression of the cell-division cycle, such as p27 
Kipl , p57 Kip2, and pi 8, were expressed in the 
quiescent fibroblasts and down-regulated be- 
fore the onset of cell division. The nadir in the 
mRNA levels for these genes occurred between 
6 and 12 hours after serum stimulation (Fig. 
5A), coincident with the passage of the fibro- 
blasts through G,. The levels of the transcript 
encoding the WEE 1 -like protein kinase, which 
is believed to inhibit mitosis by phosphoryl- 
ation of CdcZ diminished between 4 and 8 to 
12 hours after serum addition (Fig. 5 A), well 
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beforeThe onset of M phase at around 1 6 hours, 
raising the possibility of an additional role for 
Weel in an earlieT stage of the cell cycle or in 
regulating the G 0 to G, transition. Several 
genes induced in the first few hours after scrum 
stimulation, such as the helix-loop-heiix pro- 
teins ID2 and ID3 and EST AA0I6305. a gene 
with homology to G,-S cyclins, are candidates 
for roles in promoting the exit from G 0 . 

Genes involved in mediating progression 
through the cell cycle were characterized by a 
distinctive pattern of expression (Fig. 2, clus- 
ter D), reflecting the coincidence of their 
expression with the reentry of the stimulated 
fibroblasts into the cell-division cycle. The 
stimulated fibroblasts replicated their DNA 
about 16 hours after serum Treatment. This 
timing was reflected by the induction of 
mRNA encoding both subunits of ribonucle- 
otide reductase and PCNA, the processivity 
factor for DNA polymerase epsilon and delta. 
Cyclin A, Cyclin Bl, Cdc2. and CDC28 ki- 
nase, regulators of passage through the S 
phase and the transition from G 2 to M phase, 
were induced at about 16 to 20 hours after 
serum addition. The kinase in the Cyclin 
Bl-CDK pair needs to be activated by phos- 
phorylation. The gene encoding Cycl in-de- 
pendent kinase 7 (CDK7: a homolog of Xe- 
nopus M015 cdk-activating kinase) was in- 
duced in parallel with the Cdc2 and Cdc28 
kinases (Fig. 5A), suggesting a potential role 
for CDK7 in mediating M phase. DNA topo- 
isomerase II a, required for chromosome seg- 
regation at mitosis; Mad2, a component of 
the spindle checkpoint that prevents comple- 
tion of mitosis (anaphase) if chromosomes 
are not attached to the spindle; and the kinet- 
ochore protein CENP-F all showed a similar 
expression profile. 

In the hours after the scrum stimulus, one of 
the most striking feanires of the unfolding tran- 
scriptional program was the appearance of nu- 
merous genes with known roles in processes 
relevant to the physiology of wound healing. 



These included both genes involved in the di- 
rect role played by fibroblasts in remodeling of 
the clot and the extracellular matrix and, more 
notably, genes encoding proteins inv Ived m 
intercellular signaling (Fig. 5). Genes induced 
in this program encode products that can (i) 
participate in the dynamic process of clotting, 
clot dissolution, and remodeling and perhaps 
contribute to hemostasis by promoting local 
\^asoconstricrion (for example, endothelin-1); 
(ii) promote chemotaxis and activation of neu- 
trophils (for example, COX2) and recruitment 
and extravasation of monocytes and macro- 
phages (for example, MCP1); (in) promote 
chemotaxis and activation of T lymphocytes 
[for example, ihterleukin-8 (IL-8)] and B 
lymphocytes (for example, ICAM-1), thus 
providing both innate and antigen-specific 
defenses against wound infection and recruit- 
ing the phagocytic cells that will be required 
to clear out the debris during remodeling of 
the wound; (iv) promote angiogenesis and 
neovascularization (for example, VEGF) 
through newly forming tissue; (v) promote 
migration and proliferation of fibroblasts (for 
example. CTGF) and their differentiation into 
myofibroblasts (for example, Vimentin); and 
(vi) promote migration and proliferation of 
keratinocytes, leading to reepithelialization 
of the wound (for example, FGF7), and pro- 
mote proliferation of melanocytes, perhaps 
contributing to wound hyperpigmentation 
(for example, FGF2). 

Coordinated regulation of groups of genes 
whose products act at different steps in a 
common process was a recurring theme. For 
example, Furin, a prohormone-processing 
protease required for one of the processing 
steps in the generation of active endothelin, 
was induced in parallel with induction of the 
gene encoding the precursor of endothelin- 1 
(Fig. 5E) (J 3). Conversely, expression of 
CALLA/CD10. a membrane metal loprotcase 
that degrades endothelin- 1 and other peptide 
mediators of acute inflammation, was re- 
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Fig. 3. Independent verification of microarray quantitation Relative mRNA 
levels of the indicated genes (Mast, mast/stem cell growth factor receptor) 
were measured with the TaqMan 5' nuclease fluorigenic quantitative PCR 
assay (9) (left) in the same samples that were used to prepare probes for 
microarray hybridizations (right). Data from the TaqMan analysis were 



normalized to mRNA concentrations and plotted relative to the level at 
time zero, so that the results could be compared with those from the 
microarray hybridizations. In general quantitation with the two methods 
gave very similar results (70). 
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duced. A second example is provided by a set 
of five genes involved in the biosynthesis of 
cholesterol (Fig. 51). The mRNAs encoding 
each of these enzymes showed sharply dimin- 
ished expression beginning 4 to 6 hours after 
serum stimulation of fibroblasts. A likely ex* 
planation for the coordinated down -regula- 
tion of the cholesterol biosynthetic pathway 
is that serum provides cholesterol to fibro- 
blasts through low-density lipoproteins, 
whereas in the absence of the cholesterol 
provided by serum, endogenous cholesterol 
biosynthesis in fibroblasts is required. 

Many of the previously studied genes that 
we observed to be regulated in this program 
have no recognized role in any aspect of wound 
healing or fibroblast proliferation. Their identi- 
fication in this study may therefore point to 
previously unknown aspects of these processes. 
A few selected genes in this group are shown in 
Fig. 5H. The stanniocalcin gene, for example 
(Fig. 5H), encodes a secreted protein without a 
clearly identified function in human cells (J 4, 
15), Its induction in serum-stimulated fibro- 
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Fig. 4. "Reprogramming" of fibroblasts. Expres- 
sion profiles of genes whose function is likely to 
play a role in the reprogramming phase of the 
response are shown with the same representa- 
tion as in Fig. 2. In the cases in which a gene 
was represented by more than one element in 
the microarray, all measurements are shown. 
The genes were grouped into categories on the 
basis of our knowledge of their most likely role. 
Some genes with pleiotropic roles were includ- 
ed in more than one category. 
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blasts suggests the possibility that it may play a 
role in the wound-healing process, perhaps 
serving as a signal in mediating inflammation 
or angiogenesis. 

One of the most important results of this 
exploration was the discovery of over 200 pre- 
viously unknown genes whose expression was 
regulated in specific temporal patterns during 
the response of fibroblasts to scrum. For exam- 
ple, 1 3 of the 40 genes in cluster D (Fig. 2) have 
descriptive names that reflect their putative 
function. Nine of these 13 genes (69%) encode 
proteins that play roles in cell cycle progres- 
sion, particularly in DNA replication and the 
G 2 -M transition. This enrichment for cell 
cycle-related genes suggests that some of the 



unnamed genes in this cluster— for example, 
EST W7931I and EST R13146, neither of 
which have sequence similarity to previously 
characterized genes— may represent previously 
unknown genes involved in this part of the cell 
cycle. Similarly, a remarkable fraction of genes 
that were grouped into cluster F on the basis of 
their expression profiles encoded proteins in- 
volved in intercellular signaling (Fig. 2), sug- 
gesting that a similar role should be considered 
for the many unnamed genes in this cluster. A 
disproportionately large fraction of the genes 
whose transcription diminished upon serum 
stimulation were unnamed ESTs. 

Our intention was to use this experiment as 
a model to study the control of the transition 
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Fig. 5. The transcriptional response to serum suggests a multifaceted role for fibroblasts in the 
physiology of wound healing. The features of the transcriptional program of fibroblasts in response 
to serum stimulation that appear to be related to various aspects of the wound-healing process and 
ibroblast proliferation are shown with the same convention for representing changes in transcriot 
evels as was used in Figs. 2 and 4. (A) Cell cycle and proliferation, (B) coagulation and hemostasis 
(C) inflammation, (D angiogenesis, (E) tissue remodeling, (F) cytoskeletal reorganization iC) 
reepithehal.zation, (H) unidentified role in wound healing, and (I) cholesterol biosynthesis' The 
numbers in (C) and (C) refer to genes whose products serve as signals to neutrophils fCIl 
monocytes and macrophages (C2), T lymphocytes (C3). B lymphocytes (C4), and melanocytes (CI) 
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from G 0 to a proliferating state. However, one 
of the defining characteristics of genome-scale 
expression profiling experiments is that the ex- 
amination of so many diverse genes opens a 
window on all the processes that actually occur 
and not merely the single process one intended 
to observe. Serum, the soluble fraction of clot- 
ted blood, is normally encountered by cells in 
vivo in the context of a wound Indeed, the 
expression program that we observed in re- 
sponse to serum suggests that fibroblasts are 
programmed to interpret the abrupi exposure to 
serum not as a general mitogenic stimulus but 
as a specific physiological signal, signiryine a 
wound. The proliferative response that we orig- 
inally intended to study appeared to be pan of a 
larger physiological response of fibroblasts to a 
wound. Other features of the transcriptional 
response to serum suggest that the fibroblast is 
an active panicipant in a conversation among 
the diverse cells that work together in wound 
repair, interpreting, amplifying, modifying, and 
broadcasting signals controlling inflammation, 
angiogenesis, and epithelial regrowth during 
the response to an injury. 

We recognize that these in vitro results 
almost certainly represent a distoned and in- 
complete rendering of the normal physiolog- 
ical response of a fibroblast to a wound. 
Moreover, only the responses elicited directly 
by exposure of fibroblasts to scrum were 
examined. The subsequent signals from other 
cellular participants in the normal wound- 
healing process would certainly provoke fur- 
ther evolution of the transcriptional program 
in fibroblasts at the site of a wound, which 
this experiment cannot reveal. Nevertheless, 
we believe that the picture that emerged 
strongly suggests a much larger and richer 
role for the fibroblast in the orchestration of 
this important physiological process than had 
previously been suspected. 
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Systematic variation in gene expression 
patterns in human cancer cell lines 

Douglas T. Ross 1 , Uwe Scherf 5 , Michael B. Eisen 2 , Charles M. Perou 2 , Christian Rees 2 , Paul Spellman 2 , 
Vishwanath Iyer 3 , Stefanie S. Jeffrey 3 , Matt Van de Rijn 4 , Mark Waltham 5 , Alexander Pergamenschikov 2 , 
Jeffrey CF. Lee 6 , Deval Lashkari 7 , Dari Shalon 6 , Timothy G. Myers 8 , John N. Weinstein 5 , David Botstein 2 
& Patrick O. Brown 1 * 9 

We used cDNA microarrays to explore the variation in expression of approximately 8,000 unique genes among the 
60 cell lines used in the National Cancer Institute's screen for anti-cancer drugs. Classification of the cell lines based 
solely on the observed patterns of gene expression revealed a correspondence to the ostensible origins of the 
tumours from which the cell lines were derived. The consistent relationship between the gene expression patterns 
and the tissue of origin allowed us to recognize outliers whose previous classification appeared incorrect. Specific 
features of the gene expression patterns appeared to be related to physiological properties of the cell lines, such 
as their doubling time in culture, drug metabolism or the interferon response. Comparison of gene expression pat- 
terns in the cell lines to those observed in normal breast tissue or in breast tumour specimens revealed features of 
the expression patterns in the tumours that had recognizable counterparts in specific cell lines, reflecting the 
tumour, stromal and inflammatory components of the tumour tissue. These results provided a novel molecular 
characterization of this important group of human cell lines and their relationships to tumours in vivo. 



Introduction 

Cell lines derived from human rumours have been extensively used 
as experimental models of neoplastic disease. Although such cell 
lines differ from both normal and cancerous tissue, the inaccessi- 
bility of human tumours and normal tissue makes it likely that 
such cell lines will continue to be used as experimental models for 
the foreseeable future. The National Cancer Institute's Develop- 
mental Therapeutics Program (DTP) has carried out intensive 
studies of 60 cancer cell lines (the NCI 60) derived from tumours 
from a variety of tissues and organs 1 " 4 . The DTP has assessed many 
molecular features of the cells related to cancer and chemothera- 
peutic sensitivity, and has measured the sensitivities of these 60 cell 
lines to more than 70,000 different chemical compounds, includ- 
ing all common chemotherapeuu'cs (http://dtp.nci .nih.gov). A 
previous analysis of these data revealed a connection between the 
pattern of activity of a drug and its method of action. In particular, 
there was a tendency for groups of drugs with similar patterns of 
activity to have related methods of action 3,5 " 7 . 

We used DNA microarrays to survey the variation in abun- 
dance of approximately 8,000 distinct human transcripts in these 
60 cell lines. Because of the logical connection between the func- 
tion of a gene and its pattern of expression, the correlation of gene 
expression patterns with the variation in the phenotype of the cell 
can begin the process by which the function of a gene can be 
inferred. Similarly, the patterns of expression of known genes can 



reveal novel phenotypic aspects of the cells and tissues studied 8-10 . 
Here we present an analysis of the observed patterns of gene 
expression and their relationship to phenotypic properties of the 
60 cell lines. The accompanying report 11 explores the relationship 
between the gene expression patterns and the drug sensitivity pro- 
files measured by the DTP. The assessment of gene expression pat- 
terns in a multitude of cell and tissue types, such as the diverse set 
of cell lines we studied here, under diverse conditions in vitro and 
in vivo, should lead to increasingly detailed maps of the human 
gene expression program and provide clues as to the physiological 
roles of uncharacteri2ed genes 11 " 16 . The databases, plus tools for 
analysis and visualization of the data, are available (http://genome- 
www.stanford.edu/nci60 and http://discover.nci.nih.gov). 

Results 

We studied gene expression in the 60 cell lines using DNA 
microarrays prepared by robotically sporting 9,703 human 
cDNAs on glass microscope slides 17 * 18 . The cDNAs included 
approximately 8,000 different genes: approximately 3,700 repre- 
sented previously characterized human proteins, an additional 
1,900 had homologues in other organisms and the remaining 
2,400 were identified only by ESTs. Due to ambiguity of the iden- 
tity of the cDNA clones used in these studies, we estimated that 
approximately 80% of the genes in these experiments were cor- 
rectly identified. The identities of approximately 3,000 cDNAs 



Departments of > Biochemistry, ^Genetics, ^Surgery and 'Pathology, Stanford University School of Medicine, Stanford, California, USA ^Laboratory of 
Molecular Pharmacology, Division of Basic Sciences. National Cancer institute. National Institutes of Health, Bethesda, Maryland, USA A Jncyte 
Pharmaceuticals. FremonU California, USA. 'Genometrix Inc., The Woodlands, Texas, USA. "Information Technology Branch, Developmental Therapt 
Program, Division of Cancer Treatment and Diagnosis, National Cancer Institute, National Institutes of Health, Rockville, Maryland, USA 'Howard b 
Medical Institute, Stanford University School of Medicine. Stanford, California, USA. Correspondence should be addressed to P.O.B. (e-mail: 
pbrown@cmgm.stanford.edu) or J.N. W. (e-mail: vVeinstein@dtpax2.ncifcrf.gov). 



nature genetics * volume 24 • march 2000 



227 



article 



©2000 Nature America Inc. • http://genetics.nature.com 




CNS 



renal 



ovanan leukaemia colon 



H9. 1 Gene expression patterns related to the tissue of origin of the cell lines. Two-dimen- 
sional hierarchical clustering was applied to expression data from a set of 1,161 cDNAs 
measured across 64 cell lines. The 1,161 cDNAs were those (of 9,703 total) with transcript 
levels that varied by at least sevenfold (log, (ratio) >2.8) relative to the reference pool in at 
least 4 of 60 cell lines. This effectively selected genes with the greatest variation in expres- 
sion level across the 60 cell lines (including those genes not well represented in the refer- 
ence pool), and therefore highlighted those gene expression patterns that best 
distinguished the cell lines from one another. Data from 64 hybridizations were used one 
for each cell line plus the two additional independent representations of each of the cell 
lines K562 and MCF7. The two cell lines represented in triplicate were correspondingly 
weighted for the gene clustering so that each of the 60 cell lines contributed equally to the 
clustering, a. The cell-line dendrogram, with the terminal branches coloured to reflect the 
ostensible tissue of origin of the cell line (red, leukaemia; green, colon; pink, breast; pur- 
ple, prostate; light blue, lung; orange, ovarian; yellow, renal; grey, CNS; brown, melanoma- 
black, unknown (NCI/ADR-RES)). The scale to the right of the dendrogram depicts the cor- 
relation coefficient represented by the length of the dendrogram branches connecting 
pairs of nodes. Note that the two triplets of replicated cell lines (K562 and MCF7) cluster 
tightly together and were well differentiated from even the most closely related cell lines 
indicating that this clustering of cell lines is based on characteristic variations in their gene 
expression patterns rather than artefacts of the experimental procedures, b, A coloured 
representation of the data table, with the rows (genes) and columns (cell lines) in cluster 
order. The dendrogram representing hierarchical relationships between genes was omit- 
ted for clarity, but is available (http://genome-www.sta nford.edu/nci60). The colour in each 
• cell of this table reflects the mean-adjusted expression level of the gene (row) and cell line 
(column). The colour Kale used to represent the expression ratios is shown. The labels 
*3a-3d' in (b) refer to the clusters of genes shown in detail in Fig. 3. 
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from these experiments have been sequence-verified, including 
all of those referred to here by name. 

Each hybridization compared Cy5-labelled cDNA reverse tran- 
scribed from mRNA isolated from one of the cell lines with Cy3- 
labelled cDNA reverse transcribed from a reference mRNA 
sample. This reference sample, used in all hybridizations, was 
prepared by combining an equal mixture of mRNA from 12 of 
the cell lines (chosen to maximize diversity in gene expression as 
determined primarily from two-dimensional gel studies 2 ). By 
comparing cDNA from each cell line with a common reference, 
variation in gene expression across the 60 cell lines could be 
inferred from the observed variation in the normalized Cy5/Cy3 
ratios across the hybridizations. 

To assess the contribution of artefactual sources of variation in 
the experimentaDy measured expression patterns, K562 and 
MCF7 cell lines were each grown in three independent cultures, 
and the entire process was carried out independently on mRNA 
extracted from each culture. The variance in the triplicate fluo- 
rescence ratio measurements approached a minimum when the 
fluorescence signal was greater than approximately 0.4% of the 
measurable total signal dynamic range above background in 
either channel of the hybridization. We selected the subset of 
spots for which significant signal was present in both the numer- 
ator and denominator of the ratios by this criterion to identify 
the best-measured spots. The pair-wise correlation coefficients 
for the triplicates of the set of genes that passed this quality con- 
trol level (6,992 spots included for the MCF7 samples and 6,161 
spots for K562) ranged from 0.63 to 0.92 (for graphs and details, 
see http://genome-www.stanford.edu/nci60). 

To make the orderly features in the data more apparent, we used 
a hierarchical clustering algorithm 1 9J0 and a pseudo-colour visu- 



alization matrix 3 * 21 . The object of the clustering was to group cell 
lines with similar repertoires of expressed genes and to group 
genes whose expression level varied among the 60 ceil lines in a 
similar manner. Clustering was performed twice using different 
subsets of genes to assess the robustness of the analysis. In one case 
(Fig. 3), we concentrated on those genes that showed the most 
variation in expression among the 60 cell lines ( 1 ,167 total). A sec- 
ond analysis (Fig. 2) included all spots that were thought to be well 
measured in the reference set (6,831 spots). 

Gene expression patterns related to the histologic 
origins of the cell lines 

The most notable property of the clustered data was that cell lines 
with common presumptive tissues of origin grouped together 
(Figs \a and 2). Cell lines derived from leukaemia, melanoma, 
central nervous system, colon, renal and ovarian tissue were clus- 
tered into independent terminal branches specific to their respec- 
tive organ types with few exceptions. Cell lines derived from 
non-small lung carcinoma and breast tumours were distributed 
in multiple different terminal branches suggesting that their gene 
expression patterns were more heterogeneous. 

Many of these coherent cell line clusters were distinguished by 
the specific expression of characteristic groups of genes 
(Fig. 2>a-d). For example, a cluster of approximately 90 genes was 
highly expressed in the melanoma-derived lines (Fig. 3c). This set 
was enriched for genes with known roles in melanocyte biology, 
including tyrosinase and dopachrome tautomerase (TYR and 
DCT; two subunits of an enzyme complex involved in melanin 
synthesis 22 ), MARTI (MLANA; which is being investigated as a 
target for immunotherapy of melanoma 23 ) and Si 00-0 (S100B; 
which has been used as an antigenic marker in the diagnosis of 
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Fig. 2 Gene expression patterns related to 
other cell-line p he no types. a. We applied 
two-dimensional hierarchical clustering to 
expression data from a set of 6,831 cONAs 
measured across the 64 cell lines. The 6,831 
cDNAs were those with a minimum fluores- 
cence signal intensity of approximately 0.4 % 
of the dynamic range above background in 
the reference channel in each of the six 
hybridizations used to establish reproducibil- 
ity. This effectively selected those spots that 
provided the most reliable ratio measure* 
ments and therefore identified a subset of 
genes useful for exploring patterns comprised 
of those whose variation in expression across 
the 60 cell lines was of moderate magnitude. 
b. Cluster-ordered data table, c. Doubling 
time of cell lines. Cell lines are given in cluster 
order. Values are plotted relative to the mean. 
Doubling times greater than the mean are 
shown in green, those with doubling time. less 
than the mean are shown in red. d. Three 
related gene clusters that were enriched for 
genes whose expression level variation was 
correlated with cell line proliferation rate. 
Each of the three gene clusters (clustered 
solely on the basis of their expression pat- 
terns) showed enrichment for sets of genes 
involved in distinct functional categories (for 
example, ribosomal genes versus genes 
involved in pre-RNA splicing), e, Gene cluster 
in which all characterized and sequence -veri- 
fied cDNAs encode genes known to be regu- 
lated by interferons, f, Gene cluster enriched 
for genes that have been implicated in drug 
metabolism (indicated by asterisks). A further 
property of the gene clustering evident here 
and in Fig. 2 is the strong tendency for redun- 
dant representations of the same gene to 
cluster immediately adjacent to one another, 
even within larger groups of genes with very 
similar expression patterns. In addition to 
illustrating the reproducibility and consis- 
tency of the measurements, and providing 
independent confirmation of many of our 
measurements, this property also demon- 
strates that these, and probably all, genes 
have nearly unique patterns of variation 
across the 60 cell lines. If this were not the 
case, and multiple genes had identical pat- 
terns of variation, we would not expect to be 
able to distinguish, by clustering on the basis 
of expression variation, duplicate copies of 
individual genes from the other genes with 
identical expression patterns. 
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melanoma). LOXIMVI, the seventh line designated as melanoma 
in the NCI60, did not show this characteristic pattern. Although 
isolated from a patient with melanoma, LOXIMVI has previously 
been noted to lack melanin and other markers useful for identifi- 
cation of melanoma cells 1 . 

Paradoxically, two related cell lines (MDA-MB435 and MDA- 
N), which were derived from a single patient with breast cancer 
and have been conventionally regarded as breast cancer cell lines, 
shared expression of the genes associated with melanoma. MDA- 
MB435 was isolated from a pleural effusion in a patient with 
metastatic ductal adenocarcinoma of the breast 24,25 . It remains 
possible that the origin of the cell line was a breast cancer, and that 
its gene expression partem is related to the neuroendocrine fea- 
tures of some breast cancers 26 . But our results suggest that this cell 
line may have originated from a melanoma, raising the possibility 
that the patient had a co-existing occult melanoma. 

The higher-level organization of the cell-line tree — in which 
groups span cell lines from different tissue types — also reflected 
shared biological properties of the tissues from which the cell 
lines were derived. The carcinoma-derived cell lines were divided 
into major branches that separated those that expressed genes 
characteristic of epithelial cells from those that expressed genes 
more typical of stromal cells. A cluster of genes is shown (Fig. 3b) 
that is most strongly expressed in cell lines derived from colon 
carcinomas, six of seven ovarian-derived cell lines and the two 
breast cancer lines positive for the oestrogen receptor. The named 
genes in this cluster have been implicated in several aspects of 
epithelial cell biology 27 . The cluster was enriched for genes whose 
products are known to localize to the basolateral membrane of 
epithelial cells, including those encoding components of 
adherens complexes (for example, desmoplaltin (DSP), 
periplakin (PPL) and plakoglobin (JUP)), an epithelial- 
expressed cell-cell adhesion molecule (M4S1) and a sodium/ 
hydrogen ion exchanger 2 ** 31 (SLC9A1). It also contained genes 
that encode putative transcriptional regulators of epithelial mor- 
phogenesis, a human homologue of a Drosophik meianogaster 
epithelial-expressed tumour suppressor (LLGL1) and a homeo- 
box gene thought to control calcium-mediated adherence in 
epithelial cells 32 - 33 (MSX2). 

In contrast, a separate, major branch of the ceU-line dendro- 
gram (Fig. la) included all glioblastoma -derived cell lines, all 
renal-ceU-carcinoma-derived cell lines and the remaining carci- 
noma-derived lines. The characteristic set of genes expressed in 
this cluster included many whose products are involved in stro- 
mal cell functions (Fig. 3d). Indeed, the two cell lines originally 
described as 'sarcoma-like' in appearance (Hs578T, breast carci- 
nosarcoma, and SF539, gliosarcoma) expressed most of these 
genes 34 - 35 . Although no single gene was uniformly characteristic 
of this cluster, each cell line showed a distinctive pattern of 
expression of genes encoding proteins with roles in synthesis or 
modification of the extracellular matrix (for example, caldesmon 
(CALD1), cathepsins, thrombospondin (THBS), lysyl oxidase 
(LOX) and collagen subtypes). Although the ovarian and most 
non-small-cell-lung-derived carcinomas expressed genes charac- 
teristic of both epithelial cells and stromal cells, they probably 
clustered with the CNS and renal cell carcinomas in this analysis 
because genes characteristically expressed in stromal cells were 
more abundantly represented in this gene set. 



processes; the variation in their expression levels may reflect cor- 
responding differences in activity of these processes in the cdl 
lines. For example, a cluster of 1,159 genes (Fig. 2a) included 
many whose products are necessary for progression through the 
cell cycle (such as CCNA1, MCM106 and MAD2L1). RNA pro- 
cessing and translation machinery (such as RNA helicases, 
hnRNPs and translauon elongation factors) and traditional 
pathologic markers used to identify proliferating cells (MKI67). 
Within this large cluster were smaller clusters enriched for genes 
with more specialized roles. One cluster was highly enriched for 
numerous ribosomal genes, whereas another was more enriched 
for genes encoding RNA-splicing factors. The variation in 
expression of these ribosomal genes was significantly correlated 
with variation in the cell doubling time (correlation coefficient of 
0.54), supporting the notion that the genes in this cluster were 
regulated in relation to ceD proliferation rate or growth rate in 
these cell lines. 

In a smaller gene cluster (Fig. 2d), all of the named genes were 
previously known to be regulated by interferons 13 - 36 . Additional 
groups of interferon-regulated genes showed distinct patterns of 
expression (data not shown), suggesting that the NC160 cell lines 
exhibited variation in activity of interferon-response pathways, 
which was reflected in gene expression patterns 36 . 

Another cluster (Fig. 2e) contained several genes encoding 
proteins with possible interrelated roles in drug metabolism 
including glutamate-cysteineligase (GLCLC, the enzyme respon- 
sible for the rate limiting step of glutathione synthesis), thiore- 
doxin (TXN) and thioredoxin reductase (TXNRJDl; enzymes 
involved in regulating redox state in cells), and MRP1 (a drug 
transporter known to efficiently transport glutathione-conju- 
gated compounds 37 ). The elevated expression of this set of genes 
in a subset of these cell lines may reflect selection for resistance to 
chemotherapeutics. 



Physiological variation reflect d 
in gene expression patterns 

A cluster diagram of 6,83J genes (Fig. 2) is useful for exploring 
clusters of genes whose variation in mRNA levels was not obvi- 
ously attributable to ceU or tissue type. We identified some gene 
clusters that were enriched for genes involved in specific cellular 



Cell lines facilitate interpretation of gene expression 
patterns in complex clinical samples 

Like many other types of cancer, tumours of the breast typically 
have a complex histological organization, with connective tissue 
and leukocytic infiltrates interwoven with tumour cells. To 
explore the possibility that variation in gene expression in the 
tumour cell lines might provide a framework for interpreting the 
expression patterns in tumour specimens, we compared RNA 
isolated from two breast cancer biopsy samples, a sample of nor- 
mal breast tissue and the NCI60 cell lines derived from breast 
cancers (excluding MDA-MB-435 and MDA-N) and leukaemias 
(Fig. 4). This clustering highlighted features of the gene expres- 
sion partem shared between the cancer specimens and individual 
ceD lines derived from breast cancers and leukaemias. 

The genes encoding keratin 8 (KRT8) and keratin 19 (KRTI9), 
as well as most of the other epithelial 1 genes defined in the com- 
plete NC160 cell line cluster, were expressed in both of the biopsy 
samples and the rwo breast-derived cell lines, MCF-7 and T47D, 
expressing the oestrogen receptor, suggesting that these tran- 
scripts originated in tumour cells with features similar to those of 
luminal epithelial cells (Fig. 5a). Expression of a set of genes char- 
acteristic of stromal cells, including collagen genes (COUAI, 
COLS A] and COL6A1) and smooth muscle celJ markers 
{TAGLN), was a feature shared by the tumour sample and the 
stromal-like cell lines Hs578T and BT549 (Fig. 5b). This feature 
of the expression pattern seen in the tumour samples is likely to 
be due to the stromal component of the tumour. The tumours 
also shared expression of a set of genes (Fig. 5c) with the multiple 
myeloma cell line (RPMi-8226), notably including 
immunoglobulin genes, consistent with the presence of B cells 
in the tumour (this was confirmed by staining with anti- 
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immunoglobulin antibodies; data not shown). Therefore, dis- 
tinct sets of genes with co-varying expression among the samples 
(Fig. 4, arrow) appear to represent distinct eel] types that can be 
distinguished in breast cancer tissue. A fourth cluster of genes, 
more highly expressed in all of the eel] lines than in any of the 
clinical specimens, was enriched for genes present in the 'prolif- 
eration' cluster described above (Fig. 5<fl. The variation in 
expression of these genes likely paralleled the difference in prolif- 
eration rate between the rapidly cycling cultured cell lines and the 
much more slowly dividing cells in tissues. 

Discussion 

Newly available genomics tools allowed us to explore variation in 
gene expression on a genomic scale in 60 cell lines derived from 
diverse tumour tissues. We used a simple cluster analysis to iden- 
tify the prominent features in the gene expression patterns that 
appeared to reflect 'molecular signatures' of the tissue from 
which the cells originated. The histological characteristics of the 
cell lines that dominated the clustering were pervasive enough 
that similar relationships were revealed when alternative subsets 
of genes were selected for analysis. Additional features of the 
expression pattern may be related to variation in physiological 
attributes such as proliferation rate and activity of interferon- 
response pathways. 

The properties of the tumour-derived cell lines in this study 
have presumably all been shaped by selection for resistance to 
host defences and chemotherapeutics and for rapid proliferation 
in the tissue culture environment of synthetic growth media, fetal 
bovine serum and a polystyrene substratum. But the primary 
identifiable factor accounting for variation in gene expression 
patterns among these 60 cell lines was the identity of the tissue 
from which each cell line was ostensibly derived. For most of the 
cell lines we examined, neither physiological nor experimental 
adaptation for growth in culture was sufficient to overwrite the 
gene expression programs established during differentiation in 
vivo. Nevertheless, the prominence of mesenchymal features in 
the cell lines isolated from glioblastomas and carcinomas may 
reflect a selection for the relative ease of establishment of cell 
lines expressing stromal characteristics, perhaps combined with 
physiological adaptation to tissue culture conditions 38 " 40 . 



F19. 4 Comparison of the gene expression-patterns in dink. I breast c*ncer 
pecans and cu tured breast cancer and leukaemia cell lines. Two^dimen* 
s.onal hierarchical clustering applied to gene expression data for two tZ 

and'LTnToK* '.ITV 0 ?' mtU «™ U <"» one patient, normal breast 
and th c NCI60 breast and leukaemia^erived cell lines. The gene expression 
data from t.ssue speomens was clustered along with expression datTfrom a 
subset of the NC.60 ce.l .ines to explore whether features of expre^ion^t 
terns observed rn specific lines could be identified in the tissue samples Labels 
■nd-cate gene clusters (shown in detail in Fig. 5) that may be related to specific 
celMar components of the tumour specimens, b. Breast cancer specimen 16 
stained with ant.-keratin antibodies, showing the complex mil of cell types 
character.st.caHy found in breast tumours. The arrow, highlight the different 
cellular components of this tissue specimen that were distinguished by the 
gene expression cluster analysis (Fig. 5). 



Biological themes linking genes with related expression pat- 
terns may be inferred in many cases from the shared attributes of 
known genes within the clusters. Uncharactemed cDNAs are 
likely to encode proteins that have roles similar to those of the 
known gene products with which they appear to be co-regulated 
Still, for several clusters of genes, we were unable to discern a com- 
mon theme linking the identified members of the cluster. Further 
exploration of their variation in expression under more diverse 
conditions and more comprehensive investigation of the physiol- 
ogy of the NC160 cells may provide insight 10 . The relationship of 
the gene expression patterns to the drug sensitivity patterns mea- 
sured by the DTP is an example of linking variation in gene 
expression with more subtle and diverse phenotypic variation 1 >. 

The patterns of gene expression measured in the NC160 cell 
lines provide a framework that helps to distinguish the cells that 
express specific sets of genes in the histologically complex breast 
cancer specimens 41 . Although it is now feasible to analyse gene 
expression in micro-dissected tumour specimens 42 - 43 , this obser- 
vation suggests that it will be possible to explore and interpret 
some of the biology of clinical tumour samples by sampling them 
intact. As is useful in conventional morphological pathology, one 
might be able to observe inieraaions between a tumour and its 
microenvironment in this way. These relationships will be clari- 
fied by suitable analysis of gene expression patterns from intact as 
well as dissected tumours 12,14 * 15 - 41 . 

Methods 

cDNA clone. We obtained the 9,703 human cDNA clones (Research Genet- 
ics) used m these experiments as bacterial colonies in 96-weIl microtitre 
plates . Approximately 8,000 distinct Unigene clusters (representing nomi- 
naDy un.que genes) were represented in this set of clones. All genes identi- 
fied here by name represent clones whose identities were confirmed by re- 
sequencing, or by the criteria that two or more independent cDNA clones 
ostensibly representing the same gene had nearly identical gene expression 
patterns. A single-pass 3' sequence re-verification was attempted for every 
clone after re-streaking for single colonies. For a subset of genes for which 
quality 3- sequence was not obtained, we attempted to confirm identities by 
5 sequencing. Of the subset of clones selected for 5' sequence verification 
on the basis of an interesting pattern of expression (888 total), 331 were cor- 
rectly identified, 57, incorrectly identified, and 500. indeterminate (poor 
quality sequence). We estimated thai 15%-20% of array elements contained 
DNA representing more than one clone per well. So far, the identities of 
-3,000 clones have been verified. The full list of clones used and their norni- 
nal identities are available (gene names preceded by the designation "SID*" 
(Stanford Identification) represent clones whose identities have not yet been 
verified; h 1 tp://genome-www.stanford.edu:8000/nci60). 

Production of cDNA microarrays. The arrays used in this experiment were 
produced at Synteni Inc. (now Jncyte Pharmaceuticals). Each insert was 
amplified from a bacterial colony by sampling 1 u) of bacterial media and 
performing PCR amplification of the insert using consensus primers for 
the three plasmids represented in the clone set ( 5 -TTGTAAAACG ACG 
GCCAGTG-3*, 5 -CACACAGG AAAC AGCTATG- 3 * ). Each PCR product 
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(100 (J) was purified by gel exclusion, concentrated and resuspended in 
3XSSC (10 ill). The PCR product, were then printed on treated class 
microscope slides using a robot with four printing tips. Detailed protocols 
for assembling and operating a microarray printer, and printing and exper- 
imental application of DNA microarrays are available (http^/cmgm 
stanford.edu/pbrown). 

Preparation of mRNA and reference pool. Cell lines were grown from NQ 
DTP frozen stocks in RPMI- 1640 supplement with phenol red, glutamine 
(2 mM) and 5% fetal calf serum. To minimize the contribution of variations 
in culture conditions or cell density to differential gene expression, we grew 
each cell line to 80% confluence and isolated mRNA 24 h after transfer to 
fresh medium. The time between removal from the incubator and lysis of the 
cells in RNA stabOiza tion buffer was minimized (< 1 min). Cells were lysed in 
buffer containing guanidium isothiocyanate and total RNA was purified 
with the RNeasy purification kit (Qiagen). We purified mRNA as needed 



usutg , polyfA) purtfication kit (OligowrOjagen) according , 0 the mam., 
facturers ins.rucuons. Denaturing agarose gd electrophoresis assessed™ 
uitegnry and relative contamination of mRNA with ribosomal RNA. 

The breast tumours were surgically excised from patients and rapidly 
transported to the pathology laboratory, where sample, for micrc*™ 
analys* were quickly frozen in liquid nitrogen and stored at -80 «C untfl 
use. A frozen tumour specimen was removed from the freezer, cut into 
small pieces (-50-100 mg each) , immediately placed into 10-12 ml of Tri- 
zoi reagent (Cbco-BRL) and homogenized using a PowerGen 125 Tissue 
Homogemzer (Fisher Scientific), starting at 5,000 r.p.m. and gradually 
mcreas.ng to -20.000 r.p.m. over a period of 30-60,. We processed the T* 
zol/tumour homogenate as described in the Trizol protocol, including an 
.nitiaJ step to remove f„. Once total RNA was obtained, we isolated mRNA 
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We combined mRNA from the following cells in equal quantities 10 
make the reference pool: HL-60 (acute myeloid leukaemia) and K562 
(chronic myeloid leukaemia); NQ-H226 (non-small-cell-lung); COLO 
205 (colon); SNB-19 (central nervous system); LOX-1MV1 (melanoma); 
OVCAR-3 and OVCAR-4 (ovarian); CAK1-1 (renal); PC-3 (prostate); and 
MCF7 and Hs578T (breast). The criterion for selection of the cell lines in 
the reference are described in detail in the accompanying manuscript 12 . 

Doubling-time calculations. We calculated doubling times based on rou- 
tine NCI60 ceD line compound screening data; and they reflect the dou- 
bling times for cells inoculated into 96-wel] plates at the screening inocula- 
tion densities and grown in RPM1 1640 medium supplemented with 5% 
fetal bovine serum for 48 h. We measured cell populations using sulforho- 
damine B optical density measurement assay. The doubling time constant k 
was calculated using the equation: N/No = e kl , where No is optical density 
for control (untreated) cells at time zero, N is optical density for control cells 
after 48-h incubation, and t is 48 h. The same equation was then used with the 
derived k to calculate the doubling time t by setting N/No = 2. For a given cell 
line, we obtained No and N values by averaging optical densities (N>6,000) 
obtained for each cefl line for a year's screening. Data and experimental details 
are available (hnp://dtp.nci. nih.gov). 

Preparation and hybridization of fluorescent labelled cDNA. For each 
comparative array hybridization, labelled cDNA was synthesized by reverse 
transcription from test cell mRNA in the presence of Cy5-dUTP, and from 
the reference mRNA with Cy3-dUTP, using the Superscript 11 reverse-tran- 
scription kit (Gibco-BRL). For each reverse transcription reaction, mRNA 
(2 pg) v/as mixed with an anchored oligo-dT (d-20T-d(AGQ) primer (4 
Mg) in a total volume of 1 5 ul, heated to 70 °C for 1 0 min and cooled on ice. 
To this sample, we added an unlabelled nucleotide pool (0.6 ul; 25 mM 
each dATP, dCTP, dGTP, and 15 mM dTTP), either Cy3 or Cy5 conjugated 
dUTP (3 pi; 1 mM; Amersham), Sxfirst-strand buffer (6 pi; 250 mM Tris- 
HCL, pH 8.3, 375 mM KG, 15 mM MgCl 3 ), 0.1 M DTT (3 pi) and 2 ul of 
Superscript 11 reverse transcriptase (200 p/ul). After a 2-h incubation at 42 
°C, the RNA was degraded by adding 1 N NaOH (1.5 pi) and incubating at 
70 °C for 10 min. The mixture was neutralized by adding of 1 N HCL ( 1 5 
pi), and the volume brought to 500 p] with TE (10 mM Tris, 1 mM EDTA). 
We added Cotl human DNA (20 pg; Gibco-BRL), and purified the probe 
by centrifugation in a Centricon-30 micro -concentrator (Amicon). The 
two separate probes were combined, brought to a volume of 500 pi, and 
concentrated again to a volume of less than 7 pi. We added 10 pg/pl 
po)y(A) RNA (1 pi; Sigma) and tRNA (10 pg/pl; Gibco-BRL) were added, 
and adjusted the volume to 9.5 pi with distilled water. For final probe 
preparation, 20xSSC (2.1 pi; 1.5 M Nad, 150 mM NaCitrate, pH 8.0) and 
10% SDS (0.35 pi) were added to a total final volume of 12 pi. The probes 
were denatured by heating for 2 min at 100 *C, incubated at 37 °C for 
20-30 min, and placed on the array under a 22 mmx22 mm glass coverslip. 
We incubated slides overnight at 65 "C for 14-18 h in a custom slide cham- 
ber with humidity maintained by a small reservoir of 3xSSC. Arrays were 
washed by submersion and agitation for 2-5 min in 2xSSC with 0.1% SDS, 
followed by lxSSC and then 0.1 xSSC The arrays were "spun dry" by cen- 
trifugation for 2 min in a slide-rack in a Beckman GS-6 tabletop centrifuge 
in Microplus carriers at 650 r.p.m. for 2 min. 



at http://rana.stanford.edu/sorrware). Each spot was defined by manual 
positioning of a grid of circles over the array image. For each fluorescent 
image, the average pixel intensity within each circle was determined, and a 
local background was computed for each spot equal to the medial pixel 
intensity in a square of 40 pixels in width and height centred on the spot 
centre, excluding all pixels within any defined spots. Net signal was deter- 
mined by subtraction of this local background from the average intensity 
for each spot. Spots deemed unsuitable for accurate quantitation because 
of array artefacts were manuaDy flagged and excluded from further analy- 
sis. Data files generated by ScanAJyze were entered into a custom database 
that maintains web- accessible files. Signal intensities between the two fluo- 
rescent images were normalized by applying a uniform scale factor to all 
intensities measured for the CyS channel. The normalization factor was 
chosen so that the mean log(Cy3/Cy5) for a subset of spots that achieved a 
minimum quality parameter (approximately 6,000 spots) was 0. This effec- 
tively defined the signal- in tensity- weigh ted 'average' spot on each array to 
ha a Cy3/Cy5 ratio of 1 .0. 

Cluster analysis. We extracted tables (rows of genes, columns of individual 
microarray hybridizations) of normalized fluorescence ratios from the data- 
base. Various selection criteria, discussed in relation to each data set, were 
applied to select subsets of genes from the 9,703 cDNA elements on the 
arrays. Before clustering and display, the logarithm of the measured fluores- 
cence ratios for each gene were centred by subtracting the arithmetic mean of 
all ratios measured for that gene. The centring makes all subsequent analyses 
independent of the amount of each gene's mRNA in the reference pool. 

We applied a hierarchical clustering algorithm separately to the cell lines 
and genes using the Pearson correlation coefficient as the measure of simi- 
larity and average linkage clustering 3 ' 19 * 2 '. The results of this process are 
two dendrograms (trees), one for the cell lines and one for the genes, in 
which very similar elements are connected by short branches, and longer 
branches join elements with diminishing degrees of similarity. For visual 
display the rows and columns in the initial data table were reordered to 
conform to the structures of the dendrograms obtained from the cluster 
analysis. Each cell in the cluster-ordered data table was replaced by a graded 
colour (pure red through black to pure green), representing the mean- 
adjusted ratio value in the cell. Gene labels in cluster diagrams are dis- 
played here only for genes that were represented in the microarray by 
sequence-verified cDNAs. A complete software implementation of this 
process is available (http://rana.stanford.edu/software). as weU as all clus- 
tering results (http://genome-www.sianford.edu/nci60). 



Array quantitation and data processing. Following hybridization, arrays 
were scanned using a laser-scanning microscope (ref. 17; http://cmgm. 
stanford.edu/pbrown). Separate images were acquired for Cy3 and Cy5. We 
carried out data reduction with the program ScanAlyze (M.B.E., available 
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